首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many statistics used in the assessment of differential item functioning (DIF) in polytomous items yield a single item-level index of measurement invariance that collapses information across all response options of the polytomous item. Utilizing a single item-level index of DIF can, however, be misleading if the magnitude or direction of the DIF changes across the steps underlying the polytomous response process. A more comprehensive approach to examining measurement invariance in polytomous item formats is to examine invariance at the level of each step of the polytomous item, a framework described in this article as differential step functioning (DSF). This article proposes a nonparametric DSF estimator that is based on the Mantel-Haenszel common odds ratio estimator ( Mantel & Haenszel, 1959 ), which is frequently implemented in the detection of DIF in dichotomous items. A simulation study demonstrated that when the level of DSF varied in magnitude or sign across the steps underlying the polytomous response options, the DSF-based approach typically provided a more powerful and accurate test of measurement invariance than did corresponding item-level DIF estimators.  相似文献   

2.
In recent years, large-scale international assessments have been increasingly used to evaluate and compare the quality of education across regions and countries. However, measurement variance between different versions of these assessments often posts threats to the validity of such cross-cultural comparisons. In this study, we investigated the cross-language, cross-cultural validity of the Programme for International Student Assessment 2006 Science assessment via three differential item functioning (DIF) analyses between the USA and Canada, Chinese Hong Kong and mainland China, and between the USA and mainland China. Furthermore, we explored three plausible causes of DIF via content analysis, namely language, curriculum and cultural differences. Our results revealed that differential curriculum coverage was the most serious cause of DIF among the three factors we investigated in this study, and differential content familiarity also contributed to DIF here. We discussed the implications of the findings for future international assessment development, and for how to best define ‘scientific literacy’ for students around the world.  相似文献   

3.

International large-scale assessment in education aims to compare educational achievement across many countries. Differences between countries in language, culture, and education give rise to differential item functioning (DIF). For many decades, DIF has been regarded as a nuisance and a threat to validity. In this paper, we take a different stance and argue that DIF holds essential information about the differences between countries. To uncover this information, we explore the use of multivariate analysis techniques as ways to analyze DIF emphasizing visualization. PISA 2012 data are used for illustration.

  相似文献   

4.
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment’s (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are present. This bias may be somewhat reduced when cross-national DIF is correlated over study cycles, which is the case in PISA. This article reviews existing methods for calculating standard errors for national trends in international large-scale assessments and proposes a new method that takes into account the dependency of linking errors at different time points. We conducted a simulation study to compare the performance of the standard error estimators. The results showed that the newly suggested estimator outperformed the existing estimators as it estimated standard errors more accurately and efficiently across all simulated conditions. Implications for practical applications are discussed.  相似文献   

5.
This study investigated differential item functioning (DIF), differential bundle functioning (DBF), and differential test functioning (DTF) across gender of the reading comprehension section of the Graduate School Entrance English Exam in China. The datasets included 10,000 test-takers’ item-level responses to 6 five-item testlets. Both DIF and DBF were examined by using poly-simultaneous item bias test and item-response-theory-likelihood-ratio test, and DTF was investigated with multi-group confirmatory factor analyses (MG-CFA). The results indicated that although none of the 30 items exhibited statistically and practically significant DIF across gender at the item level, 2 testlets were consistently identified as having significant DBF at the testlet level by the two procedures. Nonetheless, DBF does not manifest itself at the overall test score level to produce DTF based on MG-CFA. This suggests that the relationship between item-level DIF and test-level DTF is a complicated issue with the mediating effect of testlets in testlet-based language assessment.  相似文献   

6.
Students with disabilities participate in two major measurement systems. The Individuals with Disabilities Education Act emphasizes working within a Response to Intervention (RTI) framework to identify and monitor the progress of low-performing students. Persistent low-performing students also may be eligible for some form of an alternate assessment for accountability purposes. Working within these two systems, educators need technically sound measures to inform decision making. This study presents scaling results from a Curriculum Based Measurement tool designed within an RTI framework and specifically for persistently low-performing students. We use the phrase “persistently low-performing students” to refer to a specific group of students who have been identified with a nonsevere learning disability and who perform well below grade-level expectations. Key findings indicate that items appear to function well in the lower tail of the distribution of students' estimated ability level. Further, the distribution of items is positively skewed, resulting in many accessible items that are most informative for low-performing students. Results provide initial validity evidence for the measurements as one source of data for progress monitoring within an RTI framework and the identification of persistent low-performing students who may be eligible for a large-scale assessment option other than the general grade-level assessment.  相似文献   

7.
Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content, cognitive demands, item type, item text, and visual-spatial or reference factors. To facilitate the analyses, DIF studies were conducted at 3 grade levels and for 2 randomly equivalent forms of the science assessment at each grade level (administered in different years). The DIF procedure itself was a variant of the "standardization procedure" of Dorans and Kulick (1986) and was applied to very large sets of data (6 sets of data, each involving 60,000 students). It has the advantages of being easy to understand and to explain to practitioners. Several findings emerged from the study that would be useful to pass on to test development committees. For example, when there was DIF in science items, MC items tended to favor male examinees and OR items tended to favor female examinees. Compiling DIF information across multiple grades and years increases the likelihood that important trends in the data will be identified and that item writing practices will be informed by more than anecdotal reports about DIF.  相似文献   

8.
Trend estimation in international comparative large‐scale assessments relies on measurement invariance between countries. However, cross‐national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare trend estimation performances to two linking methods employing international item parameters across several conditions. The trend estimates based on the national item parameters were more accurate than the trend estimates based on the international item parameters when cross‐national DIF was present. Moreover, the use of fixed common item parameter calibrations led to biased trend estimates. The detection and elimination of DIF can reduce this bias but is also likely to increase the total error.  相似文献   

9.
The central idea of differential item functioning (DIF) is to examine differences between two groups at the item level while controlling for overall proficiency. This approach is useful for examining hypotheses at a finer-grain level than are permitted by a total test score. The methodology proposed in this paper is also aimed at estimating differences at the item rather than the overall score level, yet with the innovation where item-level differences for many groups simultaneously are the focus. This is a straightforward generalization of DIF as variance rather than one or several group differences; conceptually, this can be referred to as item difficulty variation (IDV). When instruction is of interest, and "groups" is a unit at which instruction is determined or delivered, then IDV signals value-added effects that can be influenced by either demographic or instructional variables.  相似文献   

10.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

11.
There has been an increased interest in the impact of unmotivated test taking on test performance and score validity. This has led to the development of new ways of measuring test-taking effort based on item response time. In particular, Response Time Effort (RTE) has been shown to provide an assessment of effort down to the level of individual item responses. A limitation of RTE, however, is that it is intended for use with selected response items that must be answered before a test taker can move on to the next item. The current study outlines a general process for measuring item-level effort that can be applied to an expanded set of item types and test-taking behaviors (such as omitted or constructed responses). This process, which is illustrated with data from a large-scale assessment program, should improve our ability to detect non-effortful test taking and perform individual score validation.  相似文献   

12.
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are multidimensional and binary. This study proposes a very general DIF assessment method in the CDM framework which is applicable for various CDMs, more than two groups of examinees, and multiple grouping variables that are categorical, continuous, observed, or latent. The parameters can be estimated with Markov chain Monte Carlo algorithms implemented in the freeware WinBUGS. Simulation results demonstrated a good parameter recovery and advantages in DIF assessment for the new method over the Wald method.  相似文献   

13.
The purpose of this study was to examine the performance of differential item functioning (DIF) assessment in the presence of a multilevel structure that often underlies data from large-scale testing programs. Analyses were conducted using logistic regression (LR), a popular, flexible, and effective tool for DIF detection. Data were simulated using a hierarchical framework, such as might be seen when examinees are clustered in schools, for example. Both standard and hierarchical LR (accounting for multilevel data) approaches to DIF detection were employed. Results highlight the differences in DIF detection rates when the analytic strategy matches the data structure. Specifically, when the grouping variable was within clusters, LR and HLR performed similarly in terms of Type I error control and power. However, when the grouping variable was between clusters, LR failed to maintain the nominal Type I error rate of .05. HLR was able to maintain this rate. However, power for HLR tended to be low under many conditions in the between cluster variable case.  相似文献   

14.
In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders’ physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area ‘Electricity and magnetism’. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.  相似文献   

15.
Setting international benchmarks for education systems of the Organisation for Economic Co-operation and Development (OECD) countries is one of the goals of the OECD's Programme for International Student Assessment (PISA). However, some countries are not able to participate in PISA, despite their desire to set international benchmarks for their education systems. This article presents a method of setting international benchmarks for a country's school education system, without necessarily participating in PISA, by designing a test using the test items released by PISA for public consumption. The method has been implemented in a study that involved 1,500 Grade 10 students across 60 schools in Bhutan. The students were administered a mathematics test constructed from the PISA Mathematical Literacy test items. The study showed that the performance of Bhutanese students was comparable with the performance of the students from the countries that participated in PISA 2003 and that Bhutan could learn from both high- and low-performing school education systems of those countries.  相似文献   

16.
Data from a large-scale performance assessment ( N = 105,731) were analyzed with five differential item functioning (DIF) detection methods for polytomous items to examine the congruence among the DIF detection methods. Two different versions of the item response theory (IRT) model-based likelihood ratio test, the logistic regression likelihood ratio test, the Mantel test, and the generalized Mantel–Haenszel test were compared. Results indicated some agreement among the five DIF detection methods. Because statistical power is a function of the sample size, the DIF detection results from extremely large data sets are not practically useful. As alternatives to the DIF detection methods, four IRT model-based indices of standardized impact and four observed-score indices of standardized impact for polytomous items were obtained and compared with the R 2 measures of logistic regression.  相似文献   

17.
The “Teacher Education and Development Study in Mathematics” assessed the knowledge of primary and lower-secondary teachers at the end of their training. The large-scale assessment represented the common denominator of what constitutes mathematics content knowledge and mathematics pedagogical content knowledge in the 16 participating countries. The country means provided information on the overall teacher performance in these 2 areas. By detecting and explaining differential item functioning (DIF), this paper goes beyond the country means and investigates item-by-item strengths and weaknesses of future teachers. We hypothesized that due to differences in the cultural context, teachers from different countries responded differently to subgroups of test items with certain item characteristics. Content domains, cognitive demands (including item difficulty), and item format represented, in fact, such characteristics: They significantly explained variance in DIF. Country pairs showed similar patterns in the relationship of DIF to the item characteristics. Future teachers from Taiwan and Singapore were particularly strong on mathematics content and constructed-response items. Future teachers from Russia and Poland were particularly strong on items requiring non-standard mathematical operations. The USA and Norway did particularly well on mathematics pedagogical content and data items. Thus, conditional on the countries’ mean performance, the knowledge profiles of the future teachers matched the respective national debates. This result points to the influences of the cultural context on mathematics teacher knowledge.  相似文献   

18.
Large-scale assessments of student competencies address rather broad constructs and use parsimonious, unidimensional measurement models. Differential item functioning (DIF) in certain subpopulations usually has been interpreted as error or bias. Recent work in educational measurement, however, assumes that DIF reflects the multidimensionality that is inherent in broad competency constructs and leads to differential achievement profiles. Thus, DIF parameters can be used to identify the relative strengths and weaknesses of certain student subpopulations. The present paper explores profiles of mathematical competencies in upper secondary students from six countries (Austria, France, Germany, Sweden, Switzerland, the US). DIF analyses are combined with analyses of the cognitive demands of test items based on psychological conceptualisations of mathematical problem solving. Experts judged the cognitive demands of TIMSS test items, and these demand ratings were correlated with DIF parameters. We expected that cultural framings and instructional traditions would lead to specific aspects of mathematical problem solving being fostered in classroom instruction, which should be reflected in differential item functioning in international comparative assessments. Results for the TIMSS mathematics test were in line with expectations about cultural and instructional traditions in mathematics education of the six countries.  相似文献   

19.
近年来,P ISA(国际学生评价项目)已成为了国内社会各界人士广泛关注的重大教育事件。通过P ISA测试我们看到了上海卓越的教育表现以及在教育质量与均衡上取得的显著成绩。除了赞美与掌声之外,PISA还能引发我们怎样的教育思考呢?众所周知,P ISA并不是第一个国际性的针对学生素养的大型测试项目,它的独特之处不在于可以供我们对各国15岁学生的学业成绩进行比较而在于测试设计及理念上的多方面创新。其中,PISA"素养观"就是最具变革性的理念之一。本文旨在通过梳理PISA素养观内涵及其描述框架的内容结构来阐述P ISA测试背后所蕴含的深刻的教育学转向及其立场,从而引发我们对素养及其学习模式的思考。  相似文献   

20.
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N?=?20) and non-DIF (N?=?20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号