共查询到20条相似文献,搜索用时 15 毫秒
1.
Nambury S. Raju (1937–2005) developed two model‐based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures ( Raju, 1988 ) and Raju's DFIT ( Raju, van der Linden, & Fleer, 1995 ), are based on quantifying the gap between item characteristic functions (ICFs). This approach provides an intuitive and flexible methodology for assessing DIF. The purpose of this tutorial is to explain DFIT and show how this methodology can be utilized in a variety of DIF applications. 相似文献
2.
Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed. 相似文献
3.
《Journal of moral education》2012,41(4):423-438
ABSTRACTThe Defining Issues Test (DIT) has been the dominant measure of moral development. The DIT has its roots in Kohlberg’s original stage theory of moral judgment development and asks respondents to rank a set of stage typed statements in order of importance on six stories. However, the question to what extent the DIT-data match the underlying stage model was never addressed with a statistical model. Therefore, we applied item response theory (IRT) to a large data set (55,319 cases). We found that the ordering of the stages as extracted from the raw data fitted the ordering in the underlying stage model good. Furthermore, difficulty differences of stages across the stories were found and their magnitude and location were visualized. These findings are compatible with the notion of one latent moral developmental dimension and lend support to the hundreds of studies that have used the DIT-1 and by implication support the renewed DIT-2. 相似文献
4.
Statistical Models for Ordinal Variables. C. C. Clogg and E. S. Shihadeh. Thousand Oaks, CA: Sage, 1994, 192 pages. Graphical Multivariate Analysis with AMOS, EQS and LISREL: A Visual Approach to Covariance Structure Analysis (in Japanese). Yutaka Kano. Kyoto, Japan: Gendai‐Sugakusha, 1997,235 pages. 相似文献
5.
Bor-Chen Kuo Chen-Huei Liao Kai-Chih Pai Shu-Chuan Shih Cheng-Hsuan Li Magdalena Mo Ching Mok 《教育心理学》2020,40(9):1164-1185
AbstractThe current study explores students’ collaboration and problem solving (CPS) abilities using a human-to-agent (H-A) computer-based collaborative problem solving assessment. Five CPS assessment units with 76 conversation-based items were constructed using the PISA 2015 CPS framework. In the experiment, 53,855 ninth and tenth graders in Taiwan were recruited, and a multidimensional item response analysis was used to develop CPS scales and represent the students’ collaboration and problem solving performance. The results show that the developed H-A approach is feasible for measuring students’ CPS skills, and the CPS scales are also shown to be reliable. In addition, the students’ CPS performance scores are further explored and discussed under the PISA CPS framework. 相似文献
6.
为比较结构方程模型和 IRT等级反应模型在人格量表项目筛选上的作用,以《中国大学生人格量表》的7229个实际测量数据为基础,针对因素二“爽直”分别以Lisrel8.70和Multilog7.03进行结构方程模型和等级反应模型的参数估计与拟合,比较两种方法的项目筛选结果.二者统计结果均认为项目5、6、7、8拟合度不佳,在结构方程模型上表现为因子负荷较低,整体拟合指数不理想;在等级反应模型上表现为区分度参数和位置参数不理想,相关项目的特征曲线和信息曲线形态较差.但结构方程模型倾向于项目6、8更差,而等级反应模型则倾向于项目5、6更差.结构方程模型和 IRT等级反应模型对人格量表项目的统计推断结果从总体上讲是一致的,但在个别项目上略有差异.二者各有优势,可以结合使用. 相似文献
7.
Simulation studies are extremely common in the item response theory (IRT) research literature. This article presents a didactic discussion of “truth” and “error” in IRT‐based simulation studies. We ultimately recommend that future research focus less on the simple recovery of parameters from a convenient generating IRT model, and more on practical comparative estimation studies when the data are intentionally generated to incorporate nuisance dimensionality and other sources of nuanced contamination encountered with real data. A new framework is also presented for conceptualizing and comparing various residuals in IRT studies. The new framework allows even very different calibration and scoring IRT models to be compared on a common, convenient, and highly interpretable number‐correct metric. Some illustrative examples are included. 相似文献
8.
9.
We introduce a new comparative response format, suitable for assessing personality and similar constructs. In this “graded-block” format, items measuring different constructs are first organized in blocks of 2 or more; then, pairs are formed from items within blocks. The pairs are presented 1 at a time to enable respondents expressing the extent of preference for 1 item or the other using several graded categories. We model such data using confirmatory factor analysis (CFA) for ordinal outcomes. We derive Fisher information matrices for the graded pairs, and supply R code to enable computation of standard errors of trait scores. An empirical example illustrates the approach in low-stakes personality assessments and shows that similar results are obtained when using graded blocks of size 3 and a standard Likert format. However, graded-block designs might be superior when insufficient differentiation between items is expected (due to acquiescence, halo, or social desirability). 相似文献
10.
国际教育成效评价协会儿童认知发展状况测验项目功能差异分析 总被引:3,自引:0,他引:3
本研究旨在从一维和多维的角度检测国际教育成效评价协会(IEA)儿童认知发展状况测验中中译英考题的项目功能差异(DIF)。我们分析的数据由871名中国儿童和557名美国儿童的测试数据组成。结果显示,有一半以上的题目存在实质的DIF,意味着这个测验对于中美儿童而言,并没有功能等值。使用者应谨慎使用该跨语言翻译的比较测试结果来比较中美两国考生的认知能力水平。所幸约有半数的DIF题目偏向中国,半数偏向美国,因此利用测验总分所建立的量尺,应该不至于有太大的偏误。此外,题目拟合度统计量并不能足够地检测到存在DIF的题目,还是应该进行特定的DIF分析。我们探讨了三种可能导致DIF的原因,尚需更多学科专业知识和实验来真正解释DIF的形成。 相似文献
11.
Randall David Penfield 《Educational Measurement》2014,33(1):36-48
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models. 相似文献
12.
The high school grade point average (GPA) is often adjusted to account for nominal indicators of course rigor, such as “honors” or “advanced placement.” Adjusted GPAs—also known as weighted GPAs—are frequently used for computing students’ rank in class and in the college admission process. Despite the high stakes attached to GPA, weighting policies vary considerably across states and high schools. Previous methods of estimating weighting parameters have used regression models with college course performance as the dependent variable. We discuss and demonstrate the suitability of the graded response model for estimating GPA weighting parameters and evaluating traditional weighting schemes. In our sample, which was limited to self‐reported performance in high school mathematics courses, we found that commonly used policies award more than twice the bonus points necessary to create parity for standard and advanced courses. 相似文献
13.
Given the relationships of item response theory (IRT) models to confirmatory factor analysis (CFA) models, IRT model misspecifications might be detectable through model fit indexes commonly used in categorical CFA. The purpose of this study is to investigate the sensitivity of weighted least squares with adjusted means and variance (WLSMV)-based root mean square error of approximation, comparative fit index, and Tucker–Lewis Index model fit indexes to IRT models that are misspecified due to local dependence (LD). It was found that WLSMV-based fit indexes have some functional relationships to parameter estimate bias in 2-parameter logistic models caused by violations of LD. Continued exploration into these functional relationships and development of LD-detection methods based on such relationships could hold much promise for providing IRT practitioners with global information on violations of local independence. 相似文献
14.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method. 相似文献
15.
学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。 相似文献
16.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary. 相似文献
17.
Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis. 相似文献
18.
Many educational and psychological tests are inherently multidimensional, meaning these tests measure two or more dimensions or constructs. The purpose of this module is to illustrate how test practitioners and researchers can apply multidimensional item response theory (MIRT) to understand better what their tests are measuring, how accurately the different composites of ability are being assessed, and how this information can be cycled back into the test development process. Procedures for conducting MIRT analyses–from obtaining evidence that the test is multidimensional, to modeling the test as multidimensional, to illustrating the properties of multidimensional items graphically-are described from both a theoretical and a substantive basis. This module also illustrates these procedures using data from a ninth-grade mathematics achievement test. It concludes with a discussion of future directions in MIRT research. 相似文献
19.
James Soland 《Educational Measurement》2019,38(3):86-96
As computer‐based tests become more common, there is a growing wealth of metadata related to examinees’ response processes, which include solution strategies, concentration, and operating speed. One common type of metadata is item response time. While response times have been used extensively to improve estimates of achievement, little work considers whether these metadata may provide useful information on social–emotional constructs. This study uses an analytic example to explore whether metadata might help illuminate such constructs. Specifically, analyses examine whether the amount of time students spend on test items (after accounting for item difficulty and estimates of true achievement), and difficult items in particular, tell us anything about the student's academic motivation and self‐efficacy. While results do not indicate a strong relationship between mean item durations and these constructs in general, the amount of time students spend on very difficult items is highly correlated with motivation and self‐efficacy. The implications of these findings for using response process metadata to gain information on social–emotional constructs are discussed. 相似文献
20.
Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item‐level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has recently been proposed, whereby measurement invariance is examined within each step underlying the polytomous response variable. The examination of DSF can provide valuable information concerning the nature of the DIF effect (i.e., is the DIF an item‐level effect or an effect isolated to specific score levels), the location of the DIF effect (i.e., precisely which score levels are manifesting the DIF effect), and the potential causes of a DIF effect (i.e., what properties of the item stem or task are potentially biasing). This article presents a didactic overview of the DSF framework and provides specific guidance and recommendations on how DSF can be used to enhance the examination of DIF in polytomous items. An example with real testing data is presented to illustrate the comprehensive information provided by a DSF analysis. 相似文献