计算机自适应性测验的数学模型研究   总被引:1,自引:0,他引:1  
本文讨论了一种教育评价方式———计算机自适应性测验 (CAT) ,分析了常规测验存在的弊端 ,指出CAT因其众多的优点必将取代常规测验。分析了适合于CAT实现的数学模型———逻辑斯蒂模型 ,并利用三维逻辑斯蒂模型从测验算法上实现CAT  相似文献   

从19世纪末科学的心理和教育测量诞生到现在,其测量理论不断发展和丰富,具体内容与形态有:常模理论、真分数信度理论、实证效度理论、概括力理论、项目反应理论等,测量理论跟心理学实质理论,计量模型与认知模型的相互结合,是新世纪测量理论发展的重要方向,测量理论研究存在一个由客观性科学原则的简单移用,到对被测现录能动性与测量活动主体间的自觉把握。这样一个方法论上逐步探索转变的过程。  相似文献   

黄丹媚 《考试周刊》2007,(33):146-147
本文主要从理论基础、题目分析和误差估计三方面对经典测验理论与项目反应理论的异同作一比较,并提出现阶段这两大测量理论仍将互补长短,共存发展。  相似文献   

当前产业集群产生的巨大经济效益吸引人们对其的研究方兴未艾。首先综述了产业集群生命周期理论,然后利用生态学的理论和方法分析产业集群的演化过程,最后借助逻辑斯碲曲线将产业集群生命周期划分为五个阶段:萌芽期、成长期、成熟期、暂时性衰退期和持续发展期,为我国产业集群发展提供了一个新的研究思路。  相似文献   

三种教育与心理测量理论的比较研究   总被引:4,自引:0,他引:4  
杨静 《中国考试》2006,(6):33-35
教育与心理测量理论的发展经历了两个时期:20世纪50年代之前真分数理论占主导地位,称为经典测量理论阶段;50年代至今,除经典测量理论外,还有项目反应理论、概化理论等,可称为多种理论并存阶段。经典测量理论是教育与心理测量学发展历史中最早实现数学形式化的测量理论,现代测量理论大多是在经  相似文献   

19世纪末开始兴起、20世纪30年代渐趋成熟的经典测验理论(Classical Test Theory,CTT)是以真分数理论(True Score Theory)为核心假设的测量理论及其方法体系,也称真分数理论,它是当今心理测量领域中主要存在的三大理论派别之一。  相似文献   

当前经典测量理论(CTT)、项目反应理论(IRT)与概化理论(GT)这三种心理测量派别理论并存,并各有优点与不足。随着测量理论的进一步发展,未来我国的测验理论发展的新趋向将是以IRT为主体,其他理论并存的一种局面。  相似文献   

《现代教育技术》精品课程自适应测试系统的设计   总被引:3,自引:0,他引:3  
评价方式的改革是当前教育教学改革的重要内容之一,本文在阐述项目反应理论的基础上,给出一种基于三参数逻辑斯蒂模型的自适应在线测试系统的体系结构,分析了该系统的题库建立过程、选题算法、能力评估算法以及测试终止条件,并针对<现代教育技术>国家精品课程设计了自适应测试的原型系统MET-CATS,分析了系统自适应测试的运行过程和评价过程.  相似文献   

概化等级展开模型是目前心理测量领域态度和人格测验编制中比较完善的一种模型.本文首先比较展开模型相对于其他模型所具有的独到优势,揭示出展开模型是一种对称的非单调性模型;然后对概化等级展开模型的基本思想和步骤进行介绍,包括参数估计、模型的误差分析,再到项目与测验信息函数的估计;最后对模型的优势和不足进行总结,并对其应用前景进行预测.  相似文献   

本研究基于项目反应理论,探索题目变动的公开招聘考试的最优题型。利用《北京市新进人员通用能力考试》专业技术岗位1 000名考生成绩,通过探索性因素分析保证仅包含一个维度的情况下,使用项目反应理论等级反应模型分析10个题型的性能。先将各个题型不同题目的得分加和,将不同得分的频数转换为等级,分别计算区分度、难度、类别反应曲线和信息函数。最优题型用两种方法确定,一是选取信息量占比高于均值的题型,二是排除各种参数达不到常用标准的题型。两种方法得到非常接近的结果,即逻辑推理、图表解读、短文加工、阅读理解四个题型最优。  相似文献   

基于项目反应理论的测验编制方法研究   总被引:3,自引:0,他引:3  
本文在简单介绍项目反应理论的基础上,从计量分析的角度,深入探讨了应用项目反应理论编制各种测验的一般步骤;探讨了项目反应理论题库建设方法及基于题库的测验编制方法;探讨了标准参照测验合格分数线的划分方法。  相似文献   

Applying item response theory models to repeated observations has demonstrated great promise in developmental research. By allowing the researcher to take account of the characteristics of both item response and measurement error in longitudinal trajectory analysis, it improves the reliability and validity of latent growth curve analysis. This has enabled the study, to differentially weigh individual items and examine developmental stability and change over time, to propose a comprehensive modeling framework, combining a measurement model with a structural model. Despite a large number of components requiring attention, this study focuses on model formulation, evaluates the performance of the estimators of model parameters, incorporates prior knowledge from Bayesian analysis, and applies the model using an illustrative example. It is hoped that this fundamental study can demonstrate the breadth of this unified latent growth curve model.  相似文献   

为比较结构方程模型和 IRT等级反应模型在人格量表项目筛选上的作用,以《中国大学生人格量表》的7229个实际测量数据为基础,针对因素二“爽直”分别以Lisrel8.70和Multilog7.03进行结构方程模型和等级反应模型的参数估计与拟合,比较两种方法的项目筛选结果.二者统计结果均认为项目5、6、7、8拟合度不佳,在结构方程模型上表现为因子负荷较低,整体拟合指数不理想;在等级反应模型上表现为区分度参数和位置参数不理想,相关项目的特征曲线和信息曲线形态较差.但结构方程模型倾向于项目6、8更差,而等级反应模型则倾向于项目5、6更差.结构方程模型和 IRT等级反应模型对人格量表项目的统计推断结果从总体上讲是一致的,但在个别项目上略有差异.二者各有优势,可以结合使用.  相似文献   

Sχ2 is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of Sχ2 for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of Sχ2 under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using Sχ2 within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. Sχ2 performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of Sχ2 were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of Sχ2 was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.  相似文献   

The purpose of this ITEMS module is to provide an introduction to subscores. First, examples of subscores from an operational test are provided. Then, a review of methods that can be used to examine if subscores have adequate psychometric quality is provided. It is demonstrated, using results from operational and simulated data, that subscores have to be based on a sufficient number of items and have to be sufficiently distinct from each other to have adequate psychometric quality. It is also demonstrated that several operationally reported subscores do not have adequate psychometric quality. Recommendations are made for those interested in reporting subscores for educational tests.  相似文献   

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models.  相似文献   

伴随着个性化学习需求的增长,自适应学习系统的研制已经成为教育领域的研究热点。该文首先通过问卷调查了解了高中英语词汇自适应学习系统的需求;然后在文献综述的基础上,提出了本系统开发的重点内容和理论基础;接着介绍了系统功能模块的具体实现;最后,阐述了系统在教学实践中的适用对象和时间。  相似文献   

本研究旨在从一维和多维的角度检测国际教育成效评价协会(IEA)儿童认知发展状况测验中中译英考题的项目功能差异(DIF)。我们分析的数据由871名中国儿童和557名美国儿童的测试数据组成。结果显示,有一半以上的题目存在实质的DIF,意味着这个测验对于中美儿童而言,并没有功能等值。使用者应谨慎使用该跨语言翻译的比较测试结果来比较中美两国考生的认知能力水平。所幸约有半数的DIF题目偏向中国,半数偏向美国,因此利用测验总分所建立的量尺,应该不至于有太大的偏误。此外,题目拟合度统计量并不能足够地检测到存在DIF的题目,还是应该进行特定的DIF分析。我们探讨了三种可能导致DIF的原因,尚需更多学科专业知识和实验来真正解释DIF的形成。  相似文献   

Simulation studies are extremely common in the item response theory (IRT) research literature. This article presents a didactic discussion of “truth” and “error” in IRT‐based simulation studies. We ultimately recommend that future research focus less on the simple recovery of parameters from a convenient generating IRT model, and more on practical comparative estimation studies when the data are intentionally generated to incorporate nuisance dimensionality and other sources of nuanced contamination encountered with real data. A new framework is also presented for conceptualizing and comparing various residuals in IRT studies. The new framework allows even very different calibration and scoring IRT models to be compared on a common, convenient, and highly interpretable number‐correct metric. Some illustrative examples are included.  相似文献   

