首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.  相似文献   

2.
When tests are administered under fixed time constraints, test performances can be affected by speededness. Among other consequences, speededness can result in inaccurate parameter estimates in item response theory (IRT) models, especially for items located near the end of tests (Oshima, 1994). This article presents an IRT strategy for reducing contamination in item difficulty estimates due to speededness. Ordinal constraints are applied to a mixture Rasch model (Rost, 1990) so as to distinguish two latent classes of examinees: (a) a "speeded" class, comprised of examinees that had insufficient time to adequately answer end-of-test items, and (b) a "nonspeeded" class, comprised of examinees that had sufficient time to answer all items. The parameter estimates obtained for end-of-test items in the nonspeeded class are shown to more accurately approximate their difficulties when the items are administered at earlier locations on a different form of the test. A mixture model can also be used to estimate the class memberships of individual examinees. In this way, it can be determined whether membership in the speeded class is associated with other student characteristics. Results are reported for gender and ethnicity.  相似文献   

3.
项目反应数据的建模过程属于项目反应理论范畴,被称为现代测量理论。随着社会测量要求的广度和复杂度的增加,以及测量功能的不断扩展的要求,需要越来越复杂的项目反应模型来完成心理学、教育学、社会学等领域的测量任务。本文就当前较普遍以及发展迅速的项目反应复杂模型,如高阶、多维、多层模型进行论述,并且描述了复杂模型的参数评估技术,结合复杂模型的应用情况,期望本土的测量技术向客观化、尖端化发展。  相似文献   

4.
Practitioners typically face situations in which examinees have not responded to all test items. This study investigated the effect on an examinee's ability estimate when an examinee is presented an item, has ample time to answer, but decides not to respond to the item. Three approaches to ability estimation (biweight estimation, expected a posteriori, and maximum likelihood estimation) were examined. A Monte Carlo study was performed and the effect of different levels of omissions on the simulee's ability estimates was determined. Results showed that the worst estimation occurred when omits were treated as incorrect. In contrast, substitution of 0.5 for omitted responses resulted in ability estimates that were almost as accurate as those using complete data. Implications for practitioners are discussed.  相似文献   

5.
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered the true parameter. The simulation results suggest that item parameters were not recovered well when IPD was ignored, especially if there was a larger number of IPD conditions. In addition, coverage was not accurate in all IPD conditions when IPD is ignored. Also, the results suggest that the accuracy of person scores (measured by bias) is potentially problematic when the larger number of IPD items is ignored. However, the overall accuracy (measured by RMSE) and coverage were unexpectedly acceptable in the presence of IPD as defined in this study.  相似文献   

6.
The analytically derived asymptotic standard errors (SEs) of maximum likelihood (ML) item estimates can be approximated by a mathematical function without examinees' responses to test items, and the empirically determined SEs of marginal maximum likelihood estimation (MMLE)/Bayesian item estimates can be obtained when the same set of items is repeatedly estimated from the simulation (or resampling) test data. The latter method will result in rather stable and accurate SE estimates as the number of replications increases, but requires cumbersome and time-consuming calculations. Instead of using the empirically determined method, the adequacy of using the analytical-based method in predicting the SEs for item parameter estimates was examined by comparing results produced from both approaches. The results indicated that the SEs yielded from both approaches were, in most cases, very similar, especially when they were applied to a generalized partial credit model. This finding encourages test practitioners and researchers to apply the analytically asymptotic SEs of item estimates to the context of item-linking studies, as well as to the method of quantifying the SEs of equating scores for the item response theory (IRT) true-score method. Three-dimensional graphical presentation for the analytical SEs of item estimates as the bivariate function of item difficulty together with item discrimination was also provided for a better understanding of several frequently used IRT models.  相似文献   

7.
Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same time, a number of data imputation methods have been developed outside of the IRT framework and been shown to be effective tools for dealing with missing data. The current study takes several of these methods that have been found to be useful in other contexts and investigates their performance with IRT data that contain missing values. Through a simulation study, it is shown that these methods exhibit varying degrees of effectiveness in terms of imputing data that in turn produce accurate sample estimates of item difficulty and discrimination parameters.  相似文献   

8.
《教育实用测度》2013,26(2):125-141
Item parameter instability can threaten the validity of inferences about changes in student achievement when using Item Response Theory- (IRT) based test scores obtained on different occasions. This article illustrates a model-testing approach for evaluating the stability of IRT item parameter estimates in a pretest-posttest design. Stability of item parameter estimates was assessed for a random sample of pretest and posttest responses to a 19-item math test. Using MULTILOG (Thissen, 1986), IRT models were estimated in which item parameter estimates were constrained to be equal across samples (reflecting stability) and item parameter estimates were free to vary across samples (reflecting instability). These competing models were then compared statistically in order to test the invariance assumption. The results indicated a moderately high degree of stability in the item parameter estimates for a group of children assessed on two different occasions.  相似文献   

9.
Speededness refers to the extent to which time limits affect examinees'test performance, and it is often measured by calculating the proportion of examinees who do not reach a certain percentage of test items. However, when tests are number-right scored (i.e., no points are subtracted for incorrect responses), examinees are likely to rapidly guess on items rather than leave them blank. Therefore, this traditional measure of speededness probably underestimates the true amount of speededness on such tests. A more accurate assessment of speededness should also reflect the tendency of examinees to rapidly guess on items as time expires. This rapid-guessing component of speededness can be estimated by modeling response times with a two-state mixture model, as demonstrated with data from a computer- administered reasoning test. Taking into account the combined effect of unreached items and rapid guessing provides a more complete measure of speededness than has previously been available.  相似文献   

10.
Item response models are finding increasing use in achievement and aptitude test development. Item response theory (IRT) test development involves the selection of test items based on a consideration of their item information functions. But a problem arises because item information functions are determined by their item parameter estimates, which contain error. When the "best" items are selected on the basis of their statistical characteristics, there is a tendency to capitalize on chance due to errors in the item parameter estimates. The resulting test, therefore, falls short of the test that was desired or expected. The purposes of this article are (a) to highlight the problem of item parameter estimation errors in the test development process, (b) to demonstrate the seriousness of the problem with several simulated data sets, and (c) to offer a conservative solution for addressing the problem in IRT-based test development.  相似文献   

11.
叶萌 《考试研究》2010,(2):96-107
本文对项目反应理论(IRT)局部独立性问题的主要研究成果进行了文献梳理。在此基础上,阐释局部独立性假设的定义。文章同时就局部独立性与测验维度的关系,局部依赖的甄别与计算、起因和控制程序,以及局部依赖对测量实践的影响进行讨论,并探讨了题组中局部题目依赖问题的解决策略。  相似文献   

12.
相比参数项目反应理论,非参数项目反应理论提供了更吻合实践情境的理论框架。目前非参数项目反应理论研究主要关注参数估计方法及其比较、数据-模型拟合验证等方面,其应用研究则集中于量表修订及个性数据和项目功能差异分析,而在认知诊断理论基础上发展起来的非参数认知诊断理论更是凸显其应用优势。未来研究应更多侧重于非参数项目反应理论的实践应用,对非参数认知诊断理论的研究也值得关注,以充分发挥非参数方法在实践领域的应用优势。  相似文献   

13.
由于诸多因素的影响,大规模教育考试始终未能建立题库,只能在考后对考试数据进行分析的基础上,对命题质量进行综合评价.项目反应理论能够对试题进行深入细致的分析,对测验的编制也提出了相应的指标和方法.本文应用项目反应理论对大规模教育考试命题质量进行分析,并以<高等数学>课程为例,探讨命题质量分析的程序和方法.希望以考后命题质量的评价作为切入点,为大规模教育考试命题建立题库,积累基础项目和数据.  相似文献   

14.
张军 《考试研究》2014,(1):56-61
单调匀质模型是非参数项目反应理论中使用最广泛的模型,它有三个基本假设,适用于小规模测验的分析。本研究使用MHM分析北京语言大学汉语进修学院某次测验,结果表明测验满足弱单维性假设与弱局部独立性假设,67个项目中有9个项目的量表适宜性系数低于0.3,需要修改或删除,删除后测验为中等强度的Mokken量表。另外,有2个项目违反了单调性假设,不符合Mokken量表的要求。  相似文献   

15.
病态矩阵是IRT理论中项目参数估计必然面对的问题.本文以2PLM参数估计为例.推导出参数估计迭代公式,系统阐述了三种病态控制方法的原理及在编程中数据处理的技巧。  相似文献   

16.
How did early work of Binet and Thurstone foreshadow item response theory? What connections to work in other areas are not widely recognized? What are current application trends?  相似文献   

17.
项目反应理论下的测验信度能够评价潜在特质估计的可靠性与稳定性,由于具有宏观性的特点,项目反应理论信度的作用并不能被测验信息函数所取代,是IRT测验的一个重要指标。本文参考国内外文献,首先介绍国内外学者关于IRT信度作用的观点,并介绍和评价了多种IRT信度估计方法,然后简要介绍IRT信度的影响因素,最后展望了IRT信度领域后续研究尚可着力之处。  相似文献   

18.
学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。  相似文献   

19.
经典测量理论与项目反应理论的比较研究   总被引:3,自引:1,他引:3  
文章通过对经典测量理论和项目反应理论的模型及其假设、主要概念和参数、测量水平等方面进行比较,廓清了两种理论的联系和区别,明确了两种理论的优势和不足,从而为研究者根据测验实践的要求和各个理论的适用条件选择恰当的分析框架提供思路。  相似文献   

20.
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end‐of‐test items (i.e., speeded items). This article conducted a systematic comparison of five‐item calibration procedures—a two‐parameter logistic (2PL) model, a one‐dimensional mixture model, a two‐step strategy (a combination of the one‐dimensional mixture and the 2PL), a two‐dimensional mixture model, and a hybrid model‐–by examining how sample size, percentage of speeded examinees, percentage of missing responses, and way of scoring missing responses (incorrect vs. omitted) affect the item parameter estimation in speeded tests. For nonspeeded items, all five procedures showed similar results in recovering item parameters. For speeded items, the one‐dimensional mixture model, the two‐step strategy, and the two‐dimensional mixture model provided largely similar results and performed better than the 2PL model and the hybrid model in calibrating slope parameters. However, those three procedures performed similarly to the hybrid model in estimating intercept parameters. As expected, the 2PL model did not appear to be as accurate as the other models in recovering item parameters, especially when there were large numbers of examinees showing speededness and a high percentage of missing responses with incorrect scoring. Real data analysis further described the similarities and differences between the five procedures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号