首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
《教育实用测度》2013,26(3):217-233
The standardization and Mantel-Haenszel approaches to the assessment of differential item functioning (DIF) are described and compared. For rightwrong scoring of items, these two approaches, which emphasize the importance of comparing comparable groups of examinees, use the same data base for analysis, namely, a 2 (Group) x 2 (Item Score: Correct or Incorrect) x S (Score Level) contingency table for each item studied. The two procedures differ with respect to how they operate on these basic data tables to compare the performance of the two groups of examinees. Whereas the operations employed by Mantel-Haenszel are motivated by statistical power considerations, the operations employed by standardization are motivated by datainterpretation considerations. These differences in operation culminate in different measures of DIF effect-size that are very highly related indicators of degree of departure from the null hypothesis of no DIF.  相似文献   

2.
In this paper a new approach to graphical differential item functioning (DIF) is offered. The methodology is based on a sampling-theory approach to expected response functions (Lewis, 1985; Mislevy, Wingersky, & Sheehan, 1994). Essentially error in item calibrations is modeled explicitly, and repeated samples are taken from the posterior distributions of the item parameters. Sampled parameter values are used to estimate the posterior distribution of the difference in item characteristic curves (ICCs)for two groups. A point-wise expectation is taken as an estimate of the true difference between the ICCs, and the sampled-difference functions indicate uncertainty in the estimate. Tbe approach is applied to a set of pretest items, and the results are compared to traditional Mantel-Haenszel DIF statistics. The expected-response-function approach is contrasted with Pashley's (1992) graphical DIF approach.  相似文献   

3.
How can we best extend DIF research to performance assessment? What are the issues and problems surrounding studies of DIF on complex tasks? What appear to be the best approaches at this time?  相似文献   

4.
Students’ performance in assessments is commonly attributed to more or less effective teaching. This implies that students’ responses are significantly affected by instruction. However, the assumption that outcome measures indeed are instructionally sensitive is scarcely investigated empirically. In the present study, we propose a longitudinal multilevel‐differential item functioning (DIF) model to combine two existing yet independent approaches to evaluate items’ instructional sensitivity. The model permits for a more informative judgment of instructional sensitivity, allowing the distinction of global and differential sensitivity. Exemplarily, the model is applied to two empirical data sets, with classical indices (Pretest–Posttest Difference Index and posttest multilevel‐DIF) computed for comparison. Results suggest that the approach works well in the application to empirical data, and may provide important information to test developers.  相似文献   

5.
Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed.  相似文献   

6.
The recent emphasis on various types of performance assessments raises questions concerning the differential effects of such assessments on population subgroups. Procedures for detecting differential item functioning (DIF) in data from performance assessments are available but may be hindered by problems that stem from this mode of assessment. Foremost among these are problems related to finding an appropriate matching variable. These problems are discussed and results are presented for three methods for DIF detection in polytomous items using data from a direct writing assessment. The purpose of the study is to examine the effects of using different combinations of internal and external matching variables. The procedures included a generalized Mantel-Haenszel statistic, a technique based on meta-analysis methodology, and logistic discriminant function analysis. In general, the results did not support the use of an external matching criterion and indicated that continued problems may be expected in attempts to assess DIF in performance assessments.  相似文献   

7.
In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial difference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may differ according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the likelihood ratio test, the Mantel-Haenszel procedure and the standardized p -DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life data set.  相似文献   

8.
Loglinear latent class models are used to detect differential item functioning (DIF). These models are formulated in such a manner that the attribute to be assessed may be continuous, as in a Rasch model, or categorical, as in Latent Class Mastery models. Further, an item may exhibit DIF with respect to a manifest grouping variable, a latent grouping variable, or both. Likelihood-ratio tests for assessing the presence of various types of DIF are described, and these methods are illustrated through the analysis of a "real world" data set.  相似文献   

9.
In multiple‐choice items, differential item functioning (DIF) in the correct response may or may not be caused by differentially functioning distractors. Identifying distractors as causes of DIF can provide valuable information for potential item revision or the design of new test items. In this paper, we examine a two‐step approach based on application of a nested logit model for this purpose. The approach separates testing of differential distractor functioning (DDF) from DIF, thus allowing for clearer evaluations of where distractors may be responsible for DIF. The approach is contrasted against competing methods and evaluated in simulation and real data analyses.  相似文献   

10.
对592名大学生进行问卷调查,采用均值与协方差结构(MACS)分析方法对大学生网络利他行为量表(IABSU)进行跨地域的项目功能差异检验,结果表明:IABSU有4个题项存在跨地域的项目功能差异,即题项24、题项28和题项1存在一致性DIF,题项11存在非一致性DIF。为了提高量表的公平性和有效性,建议删除这4个有DIF的题项。  相似文献   

11.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

12.
13.
A logistic regression model for characterizing differential item functioning (DIF) between two groups is presented. A distinction is drawn between uniform and nonuniform DIF in terms of the parameters of the model. A statistic for testing the hypothesis of no DIF is developed. Through simulation studies, it is shown that the logistic regression procedure is more powerful than the Mantel-Haenszel procedure for detecting nonuniform DIF and as powerful in detecting uniform DIF.  相似文献   

14.
Detection of differential item functioning (DIF) is most often done between two groups of examinees under item response theory. It is sometimes important, however, to determine whether DIF is present in more than two groups. In this article we present a method for detection of DIF in multiple groups. The method is closely related to Lard's chi-square for comparing vectors of item parameters estimated in two groups. An example using real data is provided.  相似文献   

15.
This study attempted to pinpoint the causes of differential item difficulty for blind students taking the braille edition of the Scholastic Aptitude Test's Mathematical section (SAT-M). The study method involved reviewing the literature to identify factors that might cause differential item functioning for these examinees, forming item categories based on these factors, identifying categories that functioned differentially, and assessing the functioning o f the items comprising deviant categories to determine if the differential effect was pervasive. Results showed an association between selected item categories and differential functioning, particularly for items that included figures in the stimulus, items for which spatial estimation was helpful in eliminating at least two of the options, and items that presented figures that were small or medium in size. The precise meaning of this association was unclear, however, because some items from the suspected categories functioned normally, factors other than the hypothesized ones might have caused the observed aberrant item behavior, and the differential difficulty might reflect real population differences in relevant content knowledge  相似文献   

16.
Once a differential item functioning (DIF) item has been identified, little is known about the examinees for whom the item functions differentially. This is because DIF focuses on manifest group characteristics that are associated with it, but do not explain why examinees respond differentially to items. We first analyze item response patterns for gender DIF and then illustrate, through the use of a mixture item response theory (IRT) model, how the manifest characteristic associated with DIF often has a very weak relationship with the latent groups actually being advantaged or disadvantaged by the item(s). Next, we propose an alternative approach to DIF assessment that first uses an exploratory mixture model analysis to define the primary dimension(s) that contribute to DIF, and secondly studies examinee characteristics associated with those dimensions in order to understand the cause(s) of DIF. Comparison of academic characteristics of these examinees across classes reveals some clear differences in manifest characteristics between groups.  相似文献   

17.
Statistics used to detect differential item functioning can also reflect differential strengths and weaknesses in the performance characteristics of population subgroups. In turn, item features associated with the differential performance patterns are likely to reflect some facet of the item task and hence its difficulty, that might previously have been overlooked. In this study, several item features were identified and coded for a large number of reading comprehension items from the two admissions testing programs. Item features included subject matter content, various properties of item structure, cognitive demand indicators, and semantic content (propositional analysis). Differential item functioning was evaluated for males and females and for White and Black examinees. Results showed a number of significant relationships between item features and indicators of differential item functioning—many of which were consistent across testing programs. Implications of the results for related areas of research are discussed.  相似文献   

18.
The present study investigates the degree to which item "bias" techniques can lead to interpretable results when groups are defined in terms of specified differences in the cognitive processes involved in students' problem-solving strategies. A large group of junior high school students who took a test on subtraction of fractions was divided into two subgroups judged by the rule-space model to be using different problem-solving strategies. It was confirmed by use of Mantel-Haenszel (MH) statistics that these subgroups showed different performances on items with different underlying cognitive tasks. We note that, in our case, we are far from faulting items that show differential item functioning (D1F) between two groups defined in terms of different solution strategies. Indeed, they are "desirable" items, as explained in the discussion section  相似文献   

19.
The standardization approach to assessing differential item functioning (DIF), including standardized distractor analysis, is described. The results of studies conducted on Asian Americans, Hispanics (Mexican Americans and Puerto Ricans), and Blacks on the Scholastic Aptitude Test (SAT) are described and then synthesized across studies. Where the groups were limited to include only examinees who spoke English as their best language, very few items across forms and ethnic groups exhibited large DIF. Major findings include evidence of differential speededness (where minority examinees did not complete SAT-Verbal sections at the same rate as White students with comparable SAT-Verbal scores) for Blacks and Hispanics and, when the item content is of special interest, advantages for the relevant ethnic group. In addition, homographs tend to disadvantage all three ethnic groups, but the effect of vertical relationships in analogy items are not as consistent. Although these findings are important in understanding DIF, they do not seem to account for all differences. Other variables related to DIF still need to be identified. Furthermore, these findings are seen as tentative until corroborated by studies using controlled data collection designs.  相似文献   

20.
The No Child Left Behind act resulted in an increased reliance on large-scale standardized tests to assess the progress of individual students as well as schools. In addition, emphasis was placed on including all students in the testing programs as well as those with disabilities. As a result, the role of testing accommodations has become more central in discussions about test fairness and accessibility as well as evidence of validity. This study seeks to examine whether there exists differential item functioning for math and language items between special education examinees receiving accommodations and those not receiving accommodations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号