This study examined the reliability of the Mantel-Haenszel indexes across different samples of test takers as well as across sample sizes and investigated whether these indexes are robust to item context effects. Mathematics data from the Second International Mathematics Study (SIMS; 1985) for U.S. eighth-grade students were analyzed. The results suggest that the MH D-DIF is robust to item context effects. However, larger sample sizes than those used in this investigation (N = 141-167 for the focal group) may be necessary to obtain stable estimates from the Mantel-Haenszel procedure.  相似文献   

The Effects of Score Group Width on the Mantel-Haenszel Procedure   总被引:1,自引:0,他引:1  
Previous research examining the effects of reducing the number of score groups used in the matching criterion of the Mantel-Haenszel procedure, when screening for DIF, has produced ambiguous results. The goal of this study was to resolve the ambiguity by examining the problem with a simulated data set. The main results from this study call into question the preliminary recommendations of several other researchers that four or more score groups are sufficient and produce stable results. Although considerable stability and very little Type I error was noted with equal ability distribution comparisons, with unequal ability distributions, the Type I error rate was substantially inflated. These results argue against the appropriateness of implementing the procedure by collapsing score groups. The current data suggest that more than modest reductions in the number of score groups cannot be recommended when the ability distributions of the reference and focal groups differ  相似文献   

The purpose of this study was to compare the IRT-based area method and the Mantel-Haenszel method for investigating differential item functioning (DIF), to determine the degree of agreement between the methods in identifying potentially biased items, and, when the two methods led to different results, to identify possible reasons for the discrepancies. Data for the study were the item responses of Anglo American and Native American students who took the 1982 New Mexico High School Proficiency Exam. Two samples of 1,000 students from each group were studied. The major findings were that (a) the consistency of classifications of items into "biased" and "not-biased" categories across replications was 75% to 80% for both methods and (b) when the unreliability of the statistics was taken into account, the two methods led to very similar results. Discrepancies between methods were due to the presence of nonuniform DIF (the Mantel-Haenszel method could not identify these items) and the choice of interval over which DIF was assessed (the IRT method results depended on the choice of interval). The implications for practitioners seem clear: The Mantel-Haenszel method in general provides an acceptable approximation to the IRT-based methods.  相似文献   

Liu and Agresti (1996) proposed a Mantel and Haenszel-type (1959) estimator of a common odds ratio for several 2 × J tables, where the J columns are ordinal levels of a response variable. This article applies the Liu-Agresti estimator to the case of assessing differential item functioning (DIF) in items having an ordinal response variable. A simulation study was conducted to investigate the accuracy of the Liu-Agresti estimator in relation to other statistical DIF detection procedures. The results of the simulation study indicate that the Liu-Agresti estimator is a viable alternative to other DIF detection statistics.  相似文献   

本文探讨了如下问题:一、行政机关申请法院采取保全措施的性质;二、人民法院采取保全措施时遇到的问题和困难;三、对法院依行政机关申请采取保全措施的立法思考。  相似文献   

We developed an empirical Bayes (EB) enhancement to Mantel-Haenszel (MH) DIF analysis in which we assume that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. We use the posterior distribution of DIF parameters to make inferences about the item's true DIF status and the posterior predictive distribution to predict the item's future observed status. DIF status is expressed in terms of the probabilities associated with each of the five DIF levels defined by the ETS classification system: C–, B–, A, B+, and C+. The EB methods yield more stable DIF estimates than do conventional methods, especially in small samples, which is advantageous in computer-adaptive testing. The EB approach may also convey information about DIF stability in a more useful way by representing the state of knowledge about an item's DIF status as probabilistic.  相似文献   

Shealy and Stout (1993) proposed a DIF detection procedure called SIBTEST and demonstrated its utility with both simulated and real data sets'. Current versions of SIBTEST can be used only for dichotomous items. In this article, an extension to handle polytomous items is developed. Two simulation studies are presented which compare the modified SIBTEST procedure with the Mantel and standardized mean difference (SMD) procedures. The first study compares the procedures under conditions in which the Mantel and SMD procedures have been shown to perform well (Zwick, Donoghue, & Grima, 1993). Results of Study I suggest that SIBTEST performed reasonably well, but that the Mantel and SMD procedures performed slightly better. The second study uses data simulated under conditions in which observed-score DIF methods for dichotomous items have not performed well. The results of Study 2 indicate that under these conditions the modified SIBTEST procedure provides better control of impact-induced Type I error inflation than the other procedures.  相似文献   

A 1998 study by Bielinski and Davison reported a sex difference by item difficulty interaction in which easy items tended to be easier for females than males, and hard items tended to be harder for females than males. To extend their research to nationally representative samples of students, this study used math achievement data from the 1992 NAEP, the TIMSS, and the NELS:88. The data included students in grades 4, 8, 10, and 12. The interaction was assessed by correlating the item difficulty difference (bmale− bfemale) with item difficulty computed on the combined male/female sample. Using only the multiple-choice mathematics items, the predicted negative correlation was found for all eight populations and was significant in five. An argument is made that this phenomenon may help explain the greater variability in math achievement among males as compared to females and the emergence of higher performance of males in late adolescence.  相似文献   

复数是高中数学中的重要内容.尤其是2001年新版的高中数学教材,对复数的内容及其应用提出了更高的要求.我们知道,函数的最值与不等式有着密切的联系,不等式的概念是建立在实数的基础上,而复数通常不能比较大小,但复数与不等式并非毫无联系.其实,几个复数的实部、虚部、以及模之间还是具备通常意义下的大小关系.如何利用复数的性质求解数学问题(特别是求解距离型函数的最值问题)就显得很有意义.这种方法解题往往能起到避繁就简、化难为易的作用.本文是对这个问题的一点粗浅看法.  相似文献   

Logistic regression has recently been advanced as a viable procedure for detecting differential item functioning (DIF). One of the advantages of this procedure is the considerable flexibility it offers in the specification of the regression equation. This article describes incorporating two ability estimates into a single regression analysis, with the result that substantially fewer items exhibit DIF. A comparable analysis is conducted using the Mantel-Haenszel with similar results. It is argued that by simultaneously conditioning on two relevant ability estimates, more accurate matching of examinees in the reference and focal groups is obtained, and thus multidimensional item impact is not mistakenly identified as DIF.  相似文献   

融研究于基础教学置创新于实验环节   总被引:2,自引:0,他引:2  
在大学低年级开设研究型与设计性化学实验,让学生较早接受科学研究和实验过程的初步训练,实现了融研究于基础教学、置创新于实验过程环节的目标;通过设计性综合实验,培养学生化学素养、实验技能和创新意识.本文对开设研究型与设计性化学实验进行了理论探索和实践总结.  相似文献   

Mantel-Haenszel方法(简称M-H方法)是探测试题是否存在DIF现象的一类重要和普遍的方法。样本容量的选择是应用M-H方法的一个关键环节。本文以某年度某市高考抽样数据英语学科选择题的作答数据为总体,探讨了不同样本容量对该方法检验敏感性的影响程度。研究结果表明:对于本研究给定的总体,在一定的样本容量范围内,检验结果均具有较好的一致性。  相似文献   

Student responses to a large number of constructed response items in three Math and three Reading tests were scored on two occasions using three ways of assigning raters: single reader scoring, a different reader for each response (item-specific), and three readers each scoring a rater item block (RIB) containing approximately one-third of a student's responses. Multiple group confirmatory factor analyses indicated that the three types of total scores were most frequently tau-equivalent. Factor models fitted on the item responses attributed differences in scores to correlated ratings incurred by the same reader scoring multiple responses. These halo effects contributed to significantly increased single reader mean total scores for three of the tests. The similarity of scores for item-specific and RIB scoring suggests that the effect of rater bias on an examinee's set of responses may be minimized with the use of multiple readers though fewer than the number of items.  相似文献   

What is a complex multiple-choice test item? What is the evidence that such items should be avoided?  相似文献   

Two general item analysis indices which apply to multi-score items are developed as generalizations of a popular index applicable to dichotomous items. The indices of discrimination are of two types: one based on differential difficulty and the other on net number of positive discriminations. The usefulness and limitations of each are discussed.  相似文献   

接受美学(又称接受理论)是本世纪六十年代后兴起的文学研究方法,这一理论将读者的接受和影响作为研究中心,一反以作家作品为研究中心的传统方法。本文试图运用接受学的理论于语文教学,从一个新的视点探索提高学生语文水平的途径。  相似文献   

众所周知,在中国要想进入一个好的大学深造,高考可以说是唯一途径,因此它的重要性不言而喻。而对于美国的高中生来说,想要进入一个名牌大学远没有这么简单,让我们一起来看看他们要面临哪些问题。  相似文献   

利用3D打印技术制造了具有复杂几何参数的射孔模具,然后利用模具制作了蜡质射孔型芯,最后将其放入试样模具内浇筑混凝土试样。结果表明,该方法制作的复杂井筒模拟岩样具有较高的尺寸精度,脱蜡后的射孔孔道清洁,在一定程度上保证了物理模型与数值分析模型的一致性,为理论模型的实验验证提供了依据。3D打印技术在实验教学中的应用,简化了复杂物理模型的建立过程,避免了学习过程中断导致的学生学习热情的减弱,有效激发了学生的创新意识和探索精神。  相似文献   

This article addresses the issue of language-related construct-irrelevant variance on content area tests from the perspective of systemic functional linguistics. We propose that the construct relevance of language used in content area assessments, and consequent claims of construct-irrelevant variance and bias, should be determined according to the degree of correspondence between language use in the assessment and language use in the educational contexts in which the content is learned and used. This can be accomplished by matching the linguistic features of an assessment and the linguistic features of the domain in which the assessment is measuring achievement. This represents a departure from previous work on the assessment of English language learners’ content knowledge that has assumed complex linguistic features are a source of construct irrelevant variance by virtue of their complexity.  相似文献   

