首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P‐difference and unsigned weighted P‐difference. The performance of the effect size measures was investigated under various simulation conditions including different sample sizes and DIF magnitudes. As another way of studying DIF, the χ2 difference test was included to compare the result of statistical significance (statistical tests) with that of practical significance (effect size measures). The adequacy of existing effect size criteria used in unidimensional tests was also evaluated. Both effect size measures worked well in estimating true effect sizes, identifying DIF types, and classifying effect size categories. Finally, a real data analysis was conducted to support the simulation results.  相似文献   

2.
Nambury S. Raju (1937–2005) developed two model‐based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures ( Raju, 1988 ) and Raju's DFIT ( Raju, van der Linden, & Fleer, 1995 ), are based on quantifying the gap between item characteristic functions (ICFs). This approach provides an intuitive and flexible methodology for assessing DIF. The purpose of this tutorial is to explain DFIT and show how this methodology can be utilized in a variety of DIF applications.  相似文献   

3.
Monte Carlo simulations with 20,000 replications are reported to estimate the probability of rejecting the null hypothesis regarding DIF using SIBTEST when there is DIF present and/or when impact is present due to differences on the primary dimension to be measured. Sample sizes are varied from 250 to 2000 and test lengths from 10 to 40 items. Results generally support previous findings for Type I error rates and power. Impact is inversely related to test length. The combination of DIF and impact, with the focal group having lower ability on both the primary and secondary dimensions, results in impact partially masking DIF so that items biased toward the reference group are less likely to be detected.  相似文献   

4.
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model. The power is related to the item response function (IRF) for the studied item, the latent trait distributions, and the sample sizes for the reference and focal groups. Simulation studies show that the theoretical values calculated from the formulas derived in the article are close to what are observed in the simulated data when the assumptions are satisfied. The robustness of the power formulas are studied with simulations when the assumptions are violated.  相似文献   

5.
Published discussions of the year-to-year linking of tests comprised of polytomous items appear to suggest that the linking logic traditionally used for multiple-choice items is also appropriate for polytomous items. It is argued and illustrated that a modification of the traditional linking is necessary when tests consist of constructed-response items judged by raters and there is a possibility of year-to-year variation in the rating discrimination and severity.  相似文献   

6.
Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.  相似文献   

7.
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N?=?20) and non-DIF (N?=?20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.  相似文献   

8.
What is differential bundle functioning and how is this different from differential item functioning? Can test specifications be used to identify and aid in the interpretation of differential bundle functioning? How can differential bundle functioning lead to an improved understanding of why groups perform differently on achievement tests?  相似文献   

9.
This article examines nonmathematical linguistic complexity as a source of differential item functioning (DIF) in math word problems for English language learners (ELLs). Specifically, this study investigates the relationship between item measures of linguistic complexity, nonlinguistic forms of representation and DIF measures based on item response theory difficulty parameters in a state fourth-grade math test. This study revealed that the greater the item nonmathematical lexical and syntactic complexity, the greater are the differences in difficulty parameter estimates favoring non-ELLs over ELLs. However, the impact of linguistic complexity on DIF is attenuated when items provide nonlinguistic schematic representations that help ELLs make meaning of the text, suggesting that their inclusion could help mitigate the negative effect of increased linguistic complexity in math word problems.  相似文献   

10.
Many teachers and curriculum specialists claim that the reading demand of many mathematics items is so great that students do not perform well on mathematics tests, even though they have a good understanding of mathematics. The purpose of this research was to test this claim empirically. This analysis was accomplished by considering examinees that differed in reading ability within the context of a multidimensional DIF framework. Results indicated that student performance on some mathematics items was influenced by their level of reading ability so that examinees with lower proficiency classifications in reading were less likely to obtain correct answers to these items. This finding suggests that incorrect proficiency classifications may have occurred for some examinees. However, it is argued that rather than eliminating these mathematics items from the test, which would seem to decrease the construct validity of the test, attempts should be made to control the confounding effect of reading that is measured by some of the mathematics items.  相似文献   

11.
Oshima, Raju, Flowers, and Slinde (1998) Oshima, T. C., Raju, N. S., Flowers, C. P. and Slinde, J. A. 1998. Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning. Applied Measurement in Education, 11: 353369. [Taylor & Francis Online], [Web of Science ®] [Google Scholar] described procedures for identifying sources of differential functioning for dichotomous data using differential bundle functioning (DBF) derived from the differential functioning of items and test (DFIT) framework (Raju, van der Linden, & Fleer, 1995 Raju, N. S., van der Linden, W. J. and Fleer, P. F. 1995. IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19: 353368. [Crossref], [Web of Science ®] [Google Scholar]). The purpose of this study was to extend the procedures for dichotomous DBF to the polytomous case and to illustrate how DBF analysis can be conducted with polytomous scoring, common to psychological and educational rating scales. The data set used was parent and teacher ratings of child problem behaviors. Three group contrasts (teacher vs. parent, boy vs. girl, and random groups) and two bundle organizing principles (subscale designation and random selection) were used for the DBF analysis. Interpretations of bundle indexes in the context of child problem behaviors were presented.  相似文献   

12.
One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the noniterative estimators developed by Camilli and Penfield (1997) for tests composed of dichotomous items. A small simulation study is reported in which the statistical properties of the generalized variance estimators are assessed, and guidelines are proposed for interpreting values of DIF effect variance estimators.  相似文献   

13.
It is sometimes sensible to think of the fundamental unit of test construction as being larger than an individual item. This unit, dubbed the testlet, must pass muster in the same way that items do. One criterion of a good item is the absence of DIF–the item must function in the same way in all important subpopulations of examinees. In this article, we define what we mean by testlet DIF and provide a statistical methodology to detect it. This methodology parallels the IRT-based likelihood ratio procedures explored previously by Thissen, Steinberg, and Wainer (1988, in press). We illustrate this methodology with analyses of data from a testlet-based experimental version of the Scholastic Aptitude Test (SAT).  相似文献   

14.
A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1 −α) percentile rank score from a frequency distribution of NCDIF values under the no-DIF condition by generating a large number of item parameters based on the item parameter estimates and their variance-covariance structures from a computer program such as BIILOG-MG3. This cutoff for each item can be used as the basis for determining whether a given NCDIF index is significantly different from zero. This new method has definite advantages over the current method and yields cutoff values that are tailored to a particular data set and a particular item. A Monte Carlo assessment of this new method is presented and discussed.  相似文献   

15.
The authors used Monte Carlo methods to examine the Type I error rates for randomization tests applied to single-case data arising from ABAB designs involving random, systematic, or response-guided assignment of interventions. Six randomization tests were examined (permuting blocks of 1, 2, 3, or 5 observations, and randomly selecting intervention triplets so that each phase has at least 3 or 5 observations). When the design included randomization, the Type I error rate was controlled. When the design was systematic or guided by the absolute value of the slope, the tests permuting blocks tended to be liberal with positive autocorrelation, whereas those based on the random selection of intervention triplets tended to be conservative across levels of autocorrelation.  相似文献   

16.
电大教育内涵发展的基本问题和对策研究   总被引:1,自引:0,他引:1  
随着高等教育的发展,电大教育内涵发展面临新的机遇和挑战。内涵发展主要表现为增强办学实力、加强师资力量、提高教学水平和强化人才质量等方面。对内涵发展的研究和探索,必须兼顾、统筹规模发展和质量提高二者的辩证关系,以正确指导电大教育的长远发展。  相似文献   

17.
论课堂教学中学生差异发展的内在机制   总被引:1,自引:0,他引:1  
学生在课堂教学中的发展过程包括4个基本环节,即“形成活动意向”“参与课堂活动”“实现意义建构”和“获得个体体验”.其中,个体的“主动参与”和“交往互动”是决定个体差异发展的外显活动,“选择性输入”和“投射式解读”则是导致个体走向差异发展的内在根源.在这个过程的每一个环节,不同的个体都可能呈现出一定的差异,并最终导致个体不同的发展.  相似文献   

18.
19.
Detection of differential item functioning (DIF) is most often done between two groups of examinees under item response theory. It is sometimes important, however, to determine whether DIF is present in more than two groups. In this article we present a method for detection of DIF in multiple groups. The method is closely related to Lard's chi-square for comparing vectors of item parameters estimated in two groups. An example using real data is provided.  相似文献   

20.
Once a differential item functioning (DIF) item has been identified, little is known about the examinees for whom the item functions differentially. This is because DIF focuses on manifest group characteristics that are associated with it, but do not explain why examinees respond differentially to items. We first analyze item response patterns for gender DIF and then illustrate, through the use of a mixture item response theory (IRT) model, how the manifest characteristic associated with DIF often has a very weak relationship with the latent groups actually being advantaged or disadvantaged by the item(s). Next, we propose an alternative approach to DIF assessment that first uses an exploratory mixture model analysis to define the primary dimension(s) that contribute to DIF, and secondly studies examinee characteristics associated with those dimensions in order to understand the cause(s) of DIF. Comparison of academic characteristics of these examinees across classes reveals some clear differences in manifest characteristics between groups.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号