首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
提出检测连续协变量条件下项目功能差异的正则化方法,并将其与Logistic回归方法进行比较。模拟数据分析结果表明:1)在所有条件下,正则化方法的一类错误率比Logistic回归方法低。在DIF项目比例为20%时,正则化方法的检测效果优于Logistic回归方法。2)正则化方法对0.3的DIF值不敏感,检验力低。3)两种方法的一类错误率随着样本量增加、DIF值增加而增加,检验力随着样本量增加、DIF值增加、DIF项目比例减小而增加。将正则化方法应用于PISA2012数学测验数据,进行连续协变量下的DIF检测及正则化方法的实际应用,结果也发现正则化方法相比于Logistic方法可以更好地控制一类错误率。  相似文献   

2.
本模拟研究的目标组与参照组的项目作答反应数据是通过Rasch模型产生的,模拟研究探讨了LRDIF检测方法在不同DIF比例和纯化方式下的检测效果。研究结果表明:LRDIF方法的检测结果在DIF比例≤40%时是可信的;当采用LRDIF方法进行DIF检测时,有必要对匹配变量进行纯化,最好是能够进行迭代纯化。  相似文献   

3.
本文使用SIBTEST方法,分析情绪智力量表中文版的项目功能差异。结果表明:(1)在性别变量上,EIS中文版四个项目存在DIF,其中两个是一致性DIF,另两个则是非一致性DIF。(2)在地域变量上,EIS中文版五个项目存在DIF,其中三个是一致性DIF,另两个则是非一致性DIF。  相似文献   

4.
本文使用SIBTEST方法,分析情绪智力量表中文版的项目功能差异.结果表明:(1)在性别变量上,EIS中文版四个项目存在DIF,其中两个是一致性DIF,另两个则是非一致性DIF.(2)在地域变量上,EIS中文版五个项目存在DIF,其中三个是一致性DIF,另两个则是非一致性DIF.  相似文献   

5.
对592名大学生进行问卷调查,采用均值与协方差结构(MACS)分析方法对大学生网络利他行为量表(IABSU)进行跨地域的项目功能差异检验,结果表明:IABSU有4个题项存在跨地域的项目功能差异,即题项24、题项28和题项1存在一致性DIF,题项11存在非一致性DIF。为了提高量表的公平性和有效性,建议删除这4个有DIF的题项。  相似文献   

6.
本研究旨在从一维和多维的角度检测国际教育成效评价协会(IEA)儿童认知发展状况测验中中译英考题的项目功能差异(DIF)。我们分析的数据由871名中国儿童和557名美国儿童的测试数据组成。结果显示,有一半以上的题目存在实质的DIF,意味着这个测验对于中美儿童而言,并没有功能等值。使用者应谨慎使用该跨语言翻译的比较测试结果来比较中美两国考生的认知能力水平。所幸约有半数的DIF题目偏向中国,半数偏向美国,因此利用测验总分所建立的量尺,应该不至于有太大的偏误。此外,题目拟合度统计量并不能足够地检测到存在DIF的题目,还是应该进行特定的DIF分析。我们探讨了三种可能导致DIF的原因,尚需更多学科专业知识和实验来真正解释DIF的形成。  相似文献   

7.
本文系统梳理了我国成就测验的项目功能差异研究,主要包括介绍国外的项目功能差异研究的综合介绍性研究、利用我国的成就测验进行DIF检测方法的比较研究和影响因素研究、对我国的各种成就测验进行项目功能差异分析的应用性研究。在此基础上,指出了我国成就测验的项目功能差异研究存在的问题。  相似文献   

8.
本研究引入能够处理题组效应的项目功能差异检验方法,为篇章阅读测验提供更科学的DIF检验法。研究采用GMH法、P—SIBTEST法和P—LR法对中国汉语水平考试(HSK)(高等)阅读理解试题进行了DIF检验。结果表明,这三种方法的检验结果具有较高的一致性,该部分试题在性别与国别变量上不存在显著的DIF效应。本研究还将传统的DIF检验方法与变通的题组DIF检验方法进行了比较,结果表明后者具有明显的优越性。  相似文献   

9.
本文通过对2011年新汉语水平考试HSK(六级)8次考试的试题进行项目功能差异(DIF)分析,以评估其性别公平性。结果显示,800个试题中存在DIF的题目占总数的3.3%;800个试题的MH值平均数为0.02,其95%置信区间包含0,即试卷总体上不存在DIF。因此,HSK(六级)具有较理想的性别公平性。  相似文献   

10.
朱乙艺  焦丽亚 《考试研究》2012,(6):80-87,19
和基于实测数据的DIF研究相比,基于模拟数据的DIF研究不仅可以自由操纵实验条件,而且可以给出检验力和I型错误指标。本文详细阐述了二级计分DIF模拟数据的产生原理,其产生过程包括四个阶段:选择DIF产生思路,选择项目反应理论模型,确定考生特征、题目特征和复本数,计算考生在题目上的正确作答概率并转化为二级计分数据。并且分别利用常用软件Excel和专业软件WinGen3展示了二级计分DIF模拟数据的产生过程。  相似文献   

11.
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N?=?20) and non-DIF (N?=?20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.  相似文献   

12.
This study established a Chinese scale for measuring high school students’ ocean literacy. This included testing its reliability, validity, and differential item functioning (DIF) with the aim of compensating for the lack of DIF tests focusing on current scales. The construct validity and reliability were verified and tested by analyzing the established scale’s items using the Rasch model, and a gender DIF test was conducted to ensure the test results’ fairness when distinct groups were compared simultaneously. The results indicated that the scale established in this study is unidimensional and possesses favorable internal consistency and construct validity. The gender DIF test results indicated that several items were difficult for either female or male students to correctly answer; however, the experts and scholars discussed these items individually and suggested retaining them. The final Chinese version of the ocean literacy scale developed here comprises 48 items that can reflect high school students’ understanding of ocean literacy—which helps students understand the topics of marine science encountered in real life.  相似文献   

13.
Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.  相似文献   

14.
This paper demonstrates and discusses the use of think aloud protocols (TAPs) as an approach for examining and confirming sources of differential item functioning (DIF). The TAPs are used to investigate to what extent surface characteristics of the items that are identified by expert reviews as sources of DIF are supported by empirical evidence from examinee thinking processes in the English and French versions of a Canadian national assessment. In this research, the TAPs confirmed sources of DIF identified by expert reviews for 10 out of 20 DIF items. The moderate agreement between TAPs and expert reviews indicates that evidence from expert reviews cannot be considered sufficient in deciding whether DIF items are biased and such judgments need to include evidence from examinee thinking processes.  相似文献   

15.
Several studies have shown that the linguistic complexity of items in achievement tests may cause performance disadvantages for second language learners. However, the relative contributions of specific features of linguistic complexity to this disadvantage are largely unclear. Based on the theoretical concept of academic language, we used data from a state-wide test in mathematics for third graders in Berlin, Germany, to determine the interrelationships among several academic language features of test items and their relative effects on differential item functioning (DIF) against second language learners. Academic language features were significantly correlated with each other and with DIF. While we found text length, general academic vocabulary, and number of noun phrases to be unique predictors of DIF, substantial proportions of the variance in DIF were explained by confounded combinations of several academic language features. Specialised mathematical vocabulary was neither related to DIF nor to the other academic language features.  相似文献   

16.
The purpose of the present study is to examine the language characteristics of a few states' large-scale assessments of mathematics and science and investigate whether the language demands of the items are associated with the degree of differential item functioning (DIF) for English language learner (ELL) students. A total of 542 items from 11 assessments at Grades 4, 5, 7, and 8 from three states were rated for the linguistic complexity based on a developed linguistic coding scheme. The linguistic ratings were compared to each item's DIF statistics. The results yielded a stronger association between the linguistic rating and DIF statistics for ELL students in the “relatively easy” items than in the “not easy” items. Particularly, general academic vocabulary and the amount of language in an item were found to have the strongest association with the degrees of DIF, particularly for ELL students with low English language proficiency. Furthermore, the items were grouped into four bundles to closely look at the relationship between the varying degrees of language demands and ELL students' performance. Differential bundling functioning (DBF) results indicated that the exhibited DBF was more substantial as the language demands increased. By disentangling linguistic difficulty from content difficulty, the results of the study provide strong evidence of the impact of linguistic complexity on ELL students' performance on tests. The study discusses the implications for the validation of the tests and instructions for ELL students.  相似文献   

17.
This paper considers a modification of the DIF procedure SIBTEST for investigating the causes of differential item functioning (DIF). One way in which factors believed to be responsible for DIF can be investigated is by systematically manipulating them across multiple versions of an item using a randomized DIF study (Schmitt, Holland, & Dorans, 1993). In this paper: it is shown that the additivity of the index used for testing DIF in SIBTEST motivates a new extension of the method for statistically testing the effects of DIF factors. Because an important consideration is whether or not a studied DIF factor is consistent in its effects across items, a methodology for testing item x factor interactions is also presented. Using data from the mathematical sections of the Scholastic Assessment Test (SAT), the effects of two potential DIF factors—item format (multiple-choice versus open-ended) and problem type (abstract versus concrete)—are investigated for gender Results suggest a small but statistically significant and consistent effect of item format (favoring males for multiple-choice items) across items, and a larger but less consistent effect due to problem type.  相似文献   

18.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

19.
Data from a large-scale performance assessment ( N = 105,731) were analyzed with five differential item functioning (DIF) detection methods for polytomous items to examine the congruence among the DIF detection methods. Two different versions of the item response theory (IRT) model-based likelihood ratio test, the logistic regression likelihood ratio test, the Mantel test, and the generalized Mantel–Haenszel test were compared. Results indicated some agreement among the five DIF detection methods. Because statistical power is a function of the sample size, the DIF detection results from extremely large data sets are not practically useful. As alternatives to the DIF detection methods, four IRT model-based indices of standardized impact and four observed-score indices of standardized impact for polytomous items were obtained and compared with the R 2 measures of logistic regression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号