首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored.  相似文献   

2.
The standardization methodology was used to help identify item characteristics that might explain differential item functioning among Hispanics on the Scholastic Aptitude Test. Results indicated that true cognates or words with a common root in English and Spanish and content of special interest for Hispanics seemed to help Hispanics performance. Limited occurrence of false cognates (words that appear to be cognates but have different meanings in both languages) and of homographs (words that are spelled alike but have different meanings in English) restricted their evaluation. Nevertheless, examination of items with false cognates or homographs gave some evidence indicating that their occurrence might make items unexpectedly more difficult for Hispanic examinees  相似文献   

3.
The purpose of this study was to identify broad classes of items that behave differentially for handicapped examinees taking special, extended-time administrations of the Scholastic Aptitude Test (SA T). To identify these item classes, the performance of nine handicapped groups and one nonhandicapped group on each of two forms of the SAT was investigated through a two-stage procedure. The first stage centered on the performance of item clusters. Individual items composing clusters showing questionable performance were then examined. This two-stage procedure revealed little indication of differentially functioning item classes. However, some notable instances of differential performance at the item level were detected, the most serious of which affected visually impaired students taking the braille edition of the test.  相似文献   

4.
This study attempted to pinpoint the causes of differential item difficulty for blind students taking the braille edition of the Scholastic Aptitude Test's Mathematical section (SAT-M). The study method involved reviewing the literature to identify factors that might cause differential item functioning for these examinees, forming item categories based on these factors, identifying categories that functioned differentially, and assessing the functioning o f the items comprising deviant categories to determine if the differential effect was pervasive. Results showed an association between selected item categories and differential functioning, particularly for items that included figures in the stimulus, items for which spatial estimation was helpful in eliminating at least two of the options, and items that presented figures that were small or medium in size. The precise meaning of this association was unclear, however, because some items from the suspected categories functioned normally, factors other than the hypothesized ones might have caused the observed aberrant item behavior, and the differential difficulty might reflect real population differences in relevant content knowledge  相似文献   

5.
6.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

7.
The No Child Left Behind act resulted in an increased reliance on large-scale standardized tests to assess the progress of individual students as well as schools. In addition, emphasis was placed on including all students in the testing programs as well as those with disabilities. As a result, the role of testing accommodations has become more central in discussions about test fairness and accessibility as well as evidence of validity. This study seeks to examine whether there exists differential item functioning for math and language items between special education examinees receiving accommodations and those not receiving accommodations.  相似文献   

8.
本研究通过Monte Carlo模拟,探讨MH和LR两种方法在检测DIF时I型错误率和检出率的情况。实验结果表明两种方法的I型错误均控制在0.05左右(α=0.05),LR方法的I型错误率呈现出更加稳定的状态。一致性DIF时,MH方法的检出率略高于LR方法;而非一致性DIF时,LR方法的检出率大大高于MH方法,MH方法对非一致性DIF不敏感。另外,两种方法一致性DIF的检出率随有DIF题目的比例增加而增加,而非一致性DIF的检出率随比例的增加而有所降低。  相似文献   

9.
Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in different ways by item difficulty, item discrimination, and item lower asymptote. For example, for a fixed discrimination, the difference in probabilities decreases as the difference between the item difficulty and the mean ability increases. Under the same conditions, the log of the odds-ratio remains constant if the lower asymptote is zero. A non-zero lower asymptote decreases the absolute value of the probability difference symmetrically for easy and hard items, but it decreases the absolute value of the log-odds difference much more for difficult items. Thus, one cannot set a criterion for defining a large effect size in one metric and find a corresponding criterion in another metric that is equivalent across all items or ability distributions. In choosing an effect size, these differences must be understood and considered.  相似文献   

10.
A logistic regression model for characterizing differential item functioning (DIF) between two groups is presented. A distinction is drawn between uniform and nonuniform DIF in terms of the parameters of the model. A statistic for testing the hypothesis of no DIF is developed. Through simulation studies, it is shown that the logistic regression procedure is more powerful than the Mantel-Haenszel procedure for detecting nonuniform DIF and as powerful in detecting uniform DIF.  相似文献   

11.
Detection of differential item functioning (DIF) is most often done between two groups of examinees under item response theory. It is sometimes important, however, to determine whether DIF is present in more than two groups. In this article we present a method for detection of DIF in multiple groups. The method is closely related to Lard's chi-square for comparing vectors of item parameters estimated in two groups. An example using real data is provided.  相似文献   

12.
Once a differential item functioning (DIF) item has been identified, little is known about the examinees for whom the item functions differentially. This is because DIF focuses on manifest group characteristics that are associated with it, but do not explain why examinees respond differentially to items. We first analyze item response patterns for gender DIF and then illustrate, through the use of a mixture item response theory (IRT) model, how the manifest characteristic associated with DIF often has a very weak relationship with the latent groups actually being advantaged or disadvantaged by the item(s). Next, we propose an alternative approach to DIF assessment that first uses an exploratory mixture model analysis to define the primary dimension(s) that contribute to DIF, and secondly studies examinee characteristics associated with those dimensions in order to understand the cause(s) of DIF. Comparison of academic characteristics of these examinees across classes reveals some clear differences in manifest characteristics between groups.  相似文献   

13.
Will performance assessments in mathematics have gender DIF? Do male and female examinees provide similar solution strategies?  相似文献   

14.
In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial difference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may differ according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the likelihood ratio test, the Mantel-Haenszel procedure and the standardized p -DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life data set.  相似文献   

15.
Analysis of Differential Item Functioning in the NAEP History Assessment   总被引:1,自引:0,他引:1  
The Mantel-Haenszel approach for investigating differential item functioning was applied to U.S. history items that were administered as part o f the National Assessment o f Educational Progress, On some items, blacks, Hispanics, and females performed more poorly than other students, conditional on number-right score. It was hypothesized that this resulted, in part, from the fact that ethnic and gender groups differed in their exposure to the material included in the assessment. Supplementary Mantel-Haenszel analyses were undertaken in which the number o f historical periods studied, as well as score. was used as a conditioning variable. Contrary to expectation, the additional conditioning did not lead to a reduction in the number o f DIF items. Both methodological and substantive explanations for this unexpected result were explored.  相似文献   

16.
Gender fairness in testing can be impeded by the presence of differential item functioning (DIF), which potentially causes test bias. In this study, the presence and causes of gender-related DIF were investigated with real data from 800 items answered by 250,000 test takers. DIF was examined using the Mantel–Haenszel and logistic regression procedures. Little DIF was found in the quantitative items and a moderate amount was found in the verbal items. Vocabulary items favored women if sampled from traditionally female domains but generally not vice versa if sampled from male domains. The sentence completion item format in the English reading comprehension subtest favored men regardless of content. The findings, if supported in a cross-validation study, can potentially lead to changes in how vocabulary items are sampled and in the use of the sentence completion format in English reading comprehension, thereby increasing gender fairness in the examined test.  相似文献   

17.
18.
本文通过对PISA2000阅读素养测验的不同语言组的DIF检测,发现中文版出现DIF标记的项目较之其他语种版本而言,数量较多,在反思评价认知过程中尤其突出。日语与英语、芬兰语版本的DIF项目数量只有中文版本的二分之一左右。本文最后对DIF产生原因与检测方法的局限性进行了讨论。  相似文献   

19.
In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML estimation due to small samples and rare event data can degrade the performance of the LR procedure, especially when testing the DIF of difficult items in small samples. Penalized ML (PML) estimation was originally developed to reduce the finite-sample bias of conventional ML estimation and also was known to reduce the bias in the estimation of LR for the rare events data. The goal of this study is to compare the performances of the LR procedures based on the ML and PML estimation in terms of the statistical power and Type I error. In a simulation study, Swaminathan and Rogers's Wald test based on PML estimation (PSR) showed the highest statistical power in most of the simulation conditions, and LRT based on conventional PML estimation (PLRT) showed the most robust and stable Type I error. The discussion about the trade-off between bias and variance is presented in the discussion section.  相似文献   

20.
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P‐difference and unsigned weighted P‐difference. The performance of the effect size measures was investigated under various simulation conditions including different sample sizes and DIF magnitudes. As another way of studying DIF, the χ2 difference test was included to compare the result of statistical significance (statistical tests) with that of practical significance (effect size measures). The adequacy of existing effect size criteria used in unidimensional tests was also evaluated. Both effect size measures worked well in estimating true effect sizes, identifying DIF types, and classifying effect size categories. Finally, a real data analysis was conducted to support the simulation results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号