首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models.  相似文献   

2.
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data.  相似文献   

3.
The purpose of this ITEMS module is to provide an introduction to subscores. First, examples of subscores from an operational test are provided. Then, a review of methods that can be used to examine if subscores have adequate psychometric quality is provided. It is demonstrated, using results from operational and simulated data, that subscores have to be based on a sufficient number of items and have to be sufficiently distinct from each other to have adequate psychometric quality. It is also demonstrated that several operationally reported subscores do not have adequate psychometric quality. Recommendations are made for those interested in reporting subscores for educational tests.  相似文献   

4.
Given the relationships of item response theory (IRT) models to confirmatory factor analysis (CFA) models, IRT model misspecifications might be detectable through model fit indexes commonly used in categorical CFA. The purpose of this study is to investigate the sensitivity of weighted least squares with adjusted means and variance (WLSMV)-based root mean square error of approximation, comparative fit index, and Tucker–Lewis Index model fit indexes to IRT models that are misspecified due to local dependence (LD). It was found that WLSMV-based fit indexes have some functional relationships to parameter estimate bias in 2-parameter logistic models caused by violations of LD. Continued exploration into these functional relationships and development of LD-detection methods based on such relationships could hold much promise for providing IRT practitioners with global information on violations of local independence.  相似文献   

5.
In most large-scale assessments of student achievement, several broad content domains are tested. Because more items are needed to cover the content domains than can be presented in the limited testing time to each individual student, multiple test forms or booklets are utilized to distribute the items to the students. The construction of an appropriate booklet design is a complex and challenging endeavor that has far-reaching implications for data calibration and score reporting. This module describes the construction of booklet designs as the task of allocating items to booklets under context-specific constraints. Several types of experimental designs are presented that can be used as booklet designs. The theoretical properties and construction principles for each type of design are discussed and illustrated with examples. Finally, the evaluation of booklet designs is described and future directions for researching, teaching, and reporting on booklet designs for large-scale assessments of student achievement are identified.  相似文献   

6.
Mokken scale analysis (MSA) is a probabilistic‐nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic‐nonparametric framework in which to explore measurement quality, with an emphasis on its application in the context of educational assessment. The module describes both dichotomous and polytomous formulations of the MSA model. Examples of the application of MSA to educational assessment are provided using data from a multiple‐choice physical science assessment and a rater‐mediated writing assessment.  相似文献   

7.
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.  相似文献   

8.
Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item‐level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has recently been proposed, whereby measurement invariance is examined within each step underlying the polytomous response variable. The examination of DSF can provide valuable information concerning the nature of the DIF effect (i.e., is the DIF an item‐level effect or an effect isolated to specific score levels), the location of the DIF effect (i.e., precisely which score levels are manifesting the DIF effect), and the potential causes of a DIF effect (i.e., what properties of the item stem or task are potentially biasing). This article presents a didactic overview of the DSF framework and provides specific guidance and recommendations on how DSF can be used to enhance the examination of DIF in polytomous items. An example with real testing data is presented to illustrate the comprehensive information provided by a DSF analysis.  相似文献   

9.
10.
In this digital ITEMS module, Dr. Brian Leventhal and Dr. Allison Ames provide an overview of Monte Carlo simulation studies (MCSS) in item response theory (IRT). MCSS are utilized for a variety of reasons, one of the most compelling being that they can be used when analytic solutions are impractical or nonexistent because they allow researchers to specify and manipulate an array of parameter values and experimental conditions (e.g., sample size, test length, and test characteristics). Dr. Leventhal and Dr. Ames review the conceptual foundation of MCSS in IRT and walk through the processes of simulating total scores as well as item responses using the two-parameter logistic, graded response, and bifactor models. They provide guidance for how to implement MCSS using other item response models and best practices for efficient syntax and executing an MCSS. The digital module contains sample SAS code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

11.
基于项目反应理论的测验编制方法研究   总被引:3,自引:0,他引:3  
本文在简单介绍项目反应理论的基础上,从计量分析的角度,深入探讨了应用项目反应理论编制各种测验的一般步骤;探讨了项目反应理论题库建设方法及基于题库的测验编制方法;探讨了标准参照测验合格分数线的划分方法。  相似文献   

12.
随着计算机的普及、网络的发展、教学和考试测评理论的更新,一种基于题目反应理论的计算机自适应考试已经越来越普及,它以其题目适应不同能力学生水平自动变化的特点,已经被越来越多的考试所采用,针对题目反应理论,需要对自适应考试实现等问题加以论述。  相似文献   

13.
学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。  相似文献   

14.
The U.S. government has become increasingly focused on school climate, as recently evidenced by its inclusion as an accountability indicator in the Every Student Succeeds Act. Yet, there remains considerable variability in both conceptualizing and measuring school climate. To better inform the research and practice related to school climate and its measurement, we leveraged item response theory (IRT), a commonly used psychometric approach for the design of achievement assessments, to create a parsimonious measure of school climate that operates across varying individual characteristics. Students (n = 69,513) in 111 secondary schools completed a school climate assessment focused on three domains of climate (i.e., safety, engagement, and environment), as defined by the U.S. Department of Education. Item and test characteristics were estimated using the mirt package in R using unidimensional IRT. Analyses revealed measurement difficulties that resulted in a greater ability to assess less favorable perspectives on school climate. Differential item functioning analyses indicated measurement differences based on student academic success. These findings support the development of a broad measure of school climate but also highlight the importance of work to ensure precision in measuring school climate, particularly when considering use as an accountability measure.  相似文献   

15.
Many educational and psychological tests are inherently multidimensional, meaning these tests measure two or more dimensions or constructs. The purpose of this module is to illustrate how test practitioners and researchers can apply multidimensional item response theory (MIRT) to understand better what their tests are measuring, how accurately the different composites of ability are being assessed, and how this information can be cycled back into the test development process. Procedures for conducting MIRT analyses–from obtaining evidence that the test is multidimensional, to modeling the test as multidimensional, to illustrating the properties of multidimensional items graphically-are described from both a theoretical and a substantive basis. This module also illustrates these procedures using data from a ninth-grade mathematics achievement test. It concludes with a discussion of future directions in MIRT research.  相似文献   

16.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

17.
采用随机整群抽样抽取505名中小学教师作为被试,其中,男教师189名,女教师271名,年龄均在25至55岁之间。采用教学效能感问卷进行施测,基于项目反应理论,对测试结果进行分析,得出所有项目的区分度、难度和项目信息峰值,参考项目区分度、难度及项目信息函数峰值对教学效能感量表做了修订,再运用结构方程模型、层面理论技术和最小空间分析对修订后的量表进行质量检验,结果表明修订后的量表测量拥有更为清晰的结构效度和更高的信度,测量更为精确。运用SPSS15.0管理数据,运用Hudap6.0和MULTILOG 7.03分析数据,研究得出如下五个结论:1)教学效能感量表为单一维度,可以使用项目反应理论进行分析;2)修订后的量表项目的区分度、难度更为合理;3)修订后的量表的测验信息峰值较原量表稍低;4)修订前后量表对应层面元素之间存在高相关;5)量表的三个方面内容结构得以证实,即学生品德行为教育、课堂组织管理和知识传授。  相似文献   

18.
Despite its importance to structural equation modeling, model evaluation remains underdeveloped in the Bayesian SEM framework. Posterior predictive p-values (PPP) and deviance information criteria (DIC) are now available in popular software for Bayesian model evaluation, but they remain underutilized. This is largely due to the lack of recommendations for their use. To address this problem, PPP and DIC were evaluated in a series of Monte Carlo simulation studies. The results show that both PPP and DIC are influenced by severity of model misspecification, sample size, model size, and choice of prior. The cutoffs PPP < 0.10 and ?DIC > 7 work best in the conditions and models tested here to maintain low false detection rates and misspecified model selection rates, respectively. The recommendations provided in this study will help researchers evaluate their models in a Bayesian SEM analysis and set the stage for future development and evaluation of Bayesian SEM fit indices.  相似文献   

19.
Sχ2 is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of Sχ2 for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of Sχ2 under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using Sχ2 within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. Sχ2 performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of Sχ2 were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of Sχ2 was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.  相似文献   

20.
为比较结构方程模型和 IRT等级反应模型在人格量表项目筛选上的作用,以《中国大学生人格量表》的7229个实际测量数据为基础,针对因素二“爽直”分别以Lisrel8.70和Multilog7.03进行结构方程模型和等级反应模型的参数估计与拟合,比较两种方法的项目筛选结果.二者统计结果均认为项目5、6、7、8拟合度不佳,在结构方程模型上表现为因子负荷较低,整体拟合指数不理想;在等级反应模型上表现为区分度参数和位置参数不理想,相关项目的特征曲线和信息曲线形态较差.但结构方程模型倾向于项目6、8更差,而等级反应模型则倾向于项目5、6更差.结构方程模型和 IRT等级反应模型对人格量表项目的统计推断结果从总体上讲是一致的,但在个别项目上略有差异.二者各有优势,可以结合使用.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号