首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N?=?20) and non-DIF (N?=?20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.  相似文献   

2.
A logistic regression model for characterizing differential item functioning (DIF) between two groups is presented. A distinction is drawn between uniform and nonuniform DIF in terms of the parameters of the model. A statistic for testing the hypothesis of no DIF is developed. Through simulation studies, it is shown that the logistic regression procedure is more powerful than the Mantel-Haenszel procedure for detecting nonuniform DIF and as powerful in detecting uniform DIF.  相似文献   

3.
The authors used Monte Carlo methods to examine the Type I error rates for randomization tests applied to single-case data arising from ABAB designs involving random, systematic, or response-guided assignment of interventions. Six randomization tests were examined (permuting blocks of 1, 2, 3, or 5 observations, and randomly selecting intervention triplets so that each phase has at least 3 or 5 observations). When the design included randomization, the Type I error rate was controlled. When the design was systematic or guided by the absolute value of the slope, the tests permuting blocks tended to be liberal with positive autocorrelation, whereas those based on the random selection of intervention triplets tended to be conservative across levels of autocorrelation.  相似文献   

4.
Logistic regression has recently been advanced as a viable procedure for detecting differential item functioning (DIF). One of the advantages of this procedure is the considerable flexibility it offers in the specification of the regression equation. This article describes incorporating two ability estimates into a single regression analysis, with the result that substantially fewer items exhibit DIF. A comparable analysis is conducted using the Mantel-Haenszel with similar results. It is argued that by simultaneously conditioning on two relevant ability estimates, more accurate matching of examinees in the reference and focal groups is obtained, and thus multidimensional item impact is not mistakenly identified as DIF.  相似文献   

5.
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model. The power is related to the item response function (IRF) for the studied item, the latent trait distributions, and the sample sizes for the reference and focal groups. Simulation studies show that the theoretical values calculated from the formulas derived in the article are close to what are observed in the simulated data when the assumptions are satisfied. The robustness of the power formulas are studied with simulations when the assumptions are violated.  相似文献   

6.
Statistics used to detect differential item functioning can also reflect differential strengths and weaknesses in the performance characteristics of population subgroups. In turn, item features associated with the differential performance patterns are likely to reflect some facet of the item task and hence its difficulty, that might previously have been overlooked. In this study, several item features were identified and coded for a large number of reading comprehension items from the two admissions testing programs. Item features included subject matter content, various properties of item structure, cognitive demand indicators, and semantic content (propositional analysis). Differential item functioning was evaluated for males and females and for White and Black examinees. Results showed a number of significant relationships between item features and indicators of differential item functioning—many of which were consistent across testing programs. Implications of the results for related areas of research are discussed.  相似文献   

7.
Oshima, Raju, Flowers, and Slinde (1998) Oshima, T. C., Raju, N. S., Flowers, C. P. and Slinde, J. A. 1998. Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning. Applied Measurement in Education, 11: 353369. [Taylor & Francis Online], [Web of Science ®] [Google Scholar] described procedures for identifying sources of differential functioning for dichotomous data using differential bundle functioning (DBF) derived from the differential functioning of items and test (DFIT) framework (Raju, van der Linden, & Fleer, 1995 Raju, N. S., van der Linden, W. J. and Fleer, P. F. 1995. IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19: 353368. [Crossref], [Web of Science ®] [Google Scholar]). The purpose of this study was to extend the procedures for dichotomous DBF to the polytomous case and to illustrate how DBF analysis can be conducted with polytomous scoring, common to psychological and educational rating scales. The data set used was parent and teacher ratings of child problem behaviors. Three group contrasts (teacher vs. parent, boy vs. girl, and random groups) and two bundle organizing principles (subscale designation and random selection) were used for the DBF analysis. Interpretations of bundle indexes in the context of child problem behaviors were presented.  相似文献   

8.
Students’ performance in assessments is commonly attributed to more or less effective teaching. This implies that students’ responses are significantly affected by instruction. However, the assumption that outcome measures indeed are instructionally sensitive is scarcely investigated empirically. In the present study, we propose a longitudinal multilevel‐differential item functioning (DIF) model to combine two existing yet independent approaches to evaluate items’ instructional sensitivity. The model permits for a more informative judgment of instructional sensitivity, allowing the distinction of global and differential sensitivity. Exemplarily, the model is applied to two empirical data sets, with classical indices (Pretest–Posttest Difference Index and posttest multilevel‐DIF) computed for comparison. Results suggest that the approach works well in the application to empirical data, and may provide important information to test developers.  相似文献   

9.
Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.  相似文献   

10.
In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML estimation due to small samples and rare event data can degrade the performance of the LR procedure, especially when testing the DIF of difficult items in small samples. Penalized ML (PML) estimation was originally developed to reduce the finite-sample bias of conventional ML estimation and also was known to reduce the bias in the estimation of LR for the rare events data. The goal of this study is to compare the performances of the LR procedures based on the ML and PML estimation in terms of the statistical power and Type I error. In a simulation study, Swaminathan and Rogers's Wald test based on PML estimation (PSR) showed the highest statistical power in most of the simulation conditions, and LRT based on conventional PML estimation (PLRT) showed the most robust and stable Type I error. The discussion about the trade-off between bias and variance is presented in the discussion section.  相似文献   

11.
12.
Detection of differential item functioning (DIF) is most often done between two groups of examinees under item response theory. It is sometimes important, however, to determine whether DIF is present in more than two groups. In this article we present a method for detection of DIF in multiple groups. The method is closely related to Lard's chi-square for comparing vectors of item parameters estimated in two groups. An example using real data is provided.  相似文献   

13.
Once a differential item functioning (DIF) item has been identified, little is known about the examinees for whom the item functions differentially. This is because DIF focuses on manifest group characteristics that are associated with it, but do not explain why examinees respond differentially to items. We first analyze item response patterns for gender DIF and then illustrate, through the use of a mixture item response theory (IRT) model, how the manifest characteristic associated with DIF often has a very weak relationship with the latent groups actually being advantaged or disadvantaged by the item(s). Next, we propose an alternative approach to DIF assessment that first uses an exploratory mixture model analysis to define the primary dimension(s) that contribute to DIF, and secondly studies examinee characteristics associated with those dimensions in order to understand the cause(s) of DIF. Comparison of academic characteristics of these examinees across classes reveals some clear differences in manifest characteristics between groups.  相似文献   

14.
This article examines nonmathematical linguistic complexity as a source of differential item functioning (DIF) in math word problems for English language learners (ELLs). Specifically, this study investigates the relationship between item measures of linguistic complexity, nonlinguistic forms of representation and DIF measures based on item response theory difficulty parameters in a state fourth-grade math test. This study revealed that the greater the item nonmathematical lexical and syntactic complexity, the greater are the differences in difficulty parameter estimates favoring non-ELLs over ELLs. However, the impact of linguistic complexity on DIF is attenuated when items provide nonlinguistic schematic representations that help ELLs make meaning of the text, suggesting that their inclusion could help mitigate the negative effect of increased linguistic complexity in math word problems.  相似文献   

15.
This study attempted to pinpoint the causes of differential item difficulty for blind students taking the braille edition of the Scholastic Aptitude Test's Mathematical section (SAT-M). The study method involved reviewing the literature to identify factors that might cause differential item functioning for these examinees, forming item categories based on these factors, identifying categories that functioned differentially, and assessing the functioning o f the items comprising deviant categories to determine if the differential effect was pervasive. Results showed an association between selected item categories and differential functioning, particularly for items that included figures in the stimulus, items for which spatial estimation was helpful in eliminating at least two of the options, and items that presented figures that were small or medium in size. The precise meaning of this association was unclear, however, because some items from the suspected categories functioned normally, factors other than the hypothesized ones might have caused the observed aberrant item behavior, and the differential difficulty might reflect real population differences in relevant content knowledge  相似文献   

16.
In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial difference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may differ according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the likelihood ratio test, the Mantel-Haenszel procedure and the standardized p -DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life data set.  相似文献   

17.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

18.
The standardization approach to assessing differential item functioning (DIF), including standardized distractor analysis, is described. The results of studies conducted on Asian Americans, Hispanics (Mexican Americans and Puerto Ricans), and Blacks on the Scholastic Aptitude Test (SAT) are described and then synthesized across studies. Where the groups were limited to include only examinees who spoke English as their best language, very few items across forms and ethnic groups exhibited large DIF. Major findings include evidence of differential speededness (where minority examinees did not complete SAT-Verbal sections at the same rate as White students with comparable SAT-Verbal scores) for Blacks and Hispanics and, when the item content is of special interest, advantages for the relevant ethnic group. In addition, homographs tend to disadvantage all three ethnic groups, but the effect of vertical relationships in analogy items are not as consistent. Although these findings are important in understanding DIF, they do not seem to account for all differences. Other variables related to DIF still need to be identified. Furthermore, these findings are seen as tentative until corroborated by studies using controlled data collection designs.  相似文献   

19.
The No Child Left Behind act resulted in an increased reliance on large-scale standardized tests to assess the progress of individual students as well as schools. In addition, emphasis was placed on including all students in the testing programs as well as those with disabilities. As a result, the role of testing accommodations has become more central in discussions about test fairness and accessibility as well as evidence of validity. This study seeks to examine whether there exists differential item functioning for math and language items between special education examinees receiving accommodations and those not receiving accommodations.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号