首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
张军 《考试研究》2014,(1):56-61
单调匀质模型是非参数项目反应理论中使用最广泛的模型,它有三个基本假设,适用于小规模测验的分析。本研究使用MHM分析北京语言大学汉语进修学院某次测验,结果表明测验满足弱单维性假设与弱局部独立性假设,67个项目中有9个项目的量表适宜性系数低于0.3,需要修改或删除,删除后测验为中等强度的Mokken量表。另外,有2个项目违反了单调性假设,不符合Mokken量表的要求。  相似文献   

项目反应理论在大规模选拔性考试试题质量评价中具有经典测量理论所不具备的诸多优势,在国内外得到越来越多的应用。按照分析常模参照性测验的程序和方法,应用项目反应理论对贵阳市2011年高三英语模拟考试试题命题质量进行分析,再次证明了项目反应理论分析测验质量具有项目参数跨样本不变性、对被试特质水平的估计不受测验项目影响等优点,在基础教育考试命题工作中具有重要的价值与应用前景。  相似文献   

大规模教育考试的维度与考试分数的意义解释、考生在考试中的行为表现解释密切相关。利用非参数项目反应理论模型选择试题组成单维量表,可以达到研究考试维度的目的。选择单维量表试题时,下界c值的确定,是个值得探讨的问题。用非参数项目反应理论模型对英语考试的三个量表维度研究发现,从每一个量表中只能选择出一个单维量表,而不是与每一部分测量的微技能数相应的多个单维量表;每一单维量表测量的均为该部分不同微技能的组合。这个特征表明这些单维量表均为基本单维量表,非严格单维量表。无论c值为0.3,还是0.2,选择出的基本单维量表均满足弱单调性要求,整个单维量表的区分功能不会有明显差异。  相似文献   

经典测量理论与项目反应理论的比较研究   总被引:3,自引:1,他引:3  
文章通过对经典测量理论和项目反应理论的模型及其假设、主要概念和参数、测量水平等方面进行比较,廓清了两种理论的联系和区别,明确了两种理论的优势和不足,从而为研究者根据测验实践的要求和各个理论的适用条件选择恰当的分析框架提供思路。  相似文献   

随着计算机的普及、网络的发展、教学和考试测评理论的更新,一种基于题目反应理论的计算机自适应考试已经越来越普及,它以其题目适应不同能力学生水平自动变化的特点,已经被越来越多的考试所采用,针对题目反应理论,需要对自适应考试实现等问题加以论述。  相似文献   

基于项目反应理论的测验编制方法研究   总被引:3,自引:0,他引:3  
本文在简单介绍项目反应理论的基础上,从计量分析的角度,深入探讨了应用项目反应理论编制各种测验的一般步骤;探讨了项目反应理论题库建设方法及基于题库的测验编制方法;探讨了标准参照测验合格分数线的划分方法。  相似文献   

对IRT模型应用中须注意的几个重要问题做了分析与探讨,包括模型的假设,模型的拟合,模型对样本大小的要求等,虽然分析与探讨这些问题时主要以单维参数IRT模型为焦点,但非参数IRT模型以及多维IRT模型的应用也同样涉及这些问题。这些模型拥有某些特别的优势,在某些情况下应用是恰当的,但在健康评估领域并不是全部适用。  相似文献   

叶萌 《考试研究》2010,(2):96-107
本文对项目反应理论(IRT)局部独立性问题的主要研究成果进行了文献梳理。在此基础上,阐释局部独立性假设的定义。文章同时就局部独立性与测验维度的关系,局部依赖的甄别与计算、起因和控制程序,以及局部依赖对测量实践的影响进行讨论,并探讨了题组中局部题目依赖问题的解决策略。  相似文献   

This study compares the psychometric utility of Classical Test Theory (CTT) and Item Response Theory (IRT) for scale construction with data from higher education student surveys. Using 2008 Your First College Year (YFCY) survey data from the Cooperative Institutional Research Program at the Higher Education Research Institute at UCLA, two scales are built and tested—one measuring social involvement and one measuring academic involvement. Findings indicate that although both CTT and IRT can be used to obtain the same information about the extent to which scale items tap into the latent trait being measured, the two measurement theories provide very different pictures of scale precision. On the whole, IRT provides much richer information about measurement precision as well as a clearer roadmap for scale improvement. The findings support the use of IRT for scale construction and survey development in higher education.  相似文献   

采用随机整群抽样抽取505名中小学教师作为被试,其中,男教师189名,女教师271名,年龄均在25至55岁之间。采用教学效能感问卷进行施测,基于项目反应理论,对测试结果进行分析,得出所有项目的区分度、难度和项目信息峰值,参考项目区分度、难度及项目信息函数峰值对教学效能感量表做了修订,再运用结构方程模型、层面理论技术和最小空间分析对修订后的量表进行质量检验,结果表明修订后的量表测量拥有更为清晰的结构效度和更高的信度,测量更为精确。运用SPSS15.0管理数据,运用Hudap6.0和MULTILOG 7.03分析数据,研究得出如下五个结论:1)教学效能感量表为单一维度,可以使用项目反应理论进行分析;2)修订后的量表项目的区分度、难度更为合理;3)修订后的量表的测验信息峰值较原量表稍低;4)修订前后量表对应层面元素之间存在高相关;5)量表的三个方面内容结构得以证实,即学生品德行为教育、课堂组织管理和知识传授。  相似文献   

由于诸多因素的影响,大规模教育考试始终未能建立题库,只能在考后对考试数据进行分析的基础上,对命题质量进行综合评价.项目反应理论能够对试题进行深入细致的分析,对测验的编制也提出了相应的指标和方法.本文应用项目反应理论对大规模教育考试命题质量进行分析,并以<高等数学>课程为例,探讨命题质量分析的程序和方法.希望以考后命题质量的评价作为切入点,为大规模教育考试命题建立题库,积累基础项目和数据.  相似文献   

应用项目反应理论对中考命题质量进行分析,可以排除抽样干扰,准确评估试题的难度,客观精细地描述试题的区分度,评估整套试卷和各试题对学生能力估计的精度,查找赋分标准和阅卷过程中存在的问题。  相似文献   

症状自评量表(SCL-90)强迫分量表在临床上使用广泛.通过使用SCL-90强迫分量表对某大学全体研究生新生进行测试,结果显示:数据符合项目反应理论基本假设;除了项目10、38、65以外,其余项目的性能均较佳;对项目10、38、65予以删除,修订后的量表拟合指数符合理论要求,具有良好的拟合度.  相似文献   

用项目反应理论编制标准化考试题库系统,旨在探究教育测量的新理论应用,为学科教育评价提供辅助工具。题库以四年级数学为测验内容,按标准化编制项目施测、评分、分析,其中选用三参数的逻辑斯蒂克模型,并用贝佳法检验了测验的单纬性,利用ANOTE软件估计项目参数,用基于题目分类下的经验判断法建立一个划界分数,接合项目信息量组成题库,最后讨论了能力估计方法、抽题策略、结束控制等题库编制的关键技术。  相似文献   

Several recent papers have argued for the usefulness of item response theory (IRT) methods of assessing item discrimination power for criterion-referenced tests (CRTs). Conventional methods continue to be used more widely, however, for reasons that include some practical constraints associated with the use of IRT methods. To provide users with information that may help them to decide on which conventional indices to employ in evaluating CRT items, Spearman rank-order correlations were computed between IRT-derived item information functions (llFs) and four conventional discrimination indices: the phi-coefficient, the B-index, phi/phi max, and the agreement statistic. The rank-order correlations between the phi-coefficient and the llFs were very high, with a median of .96. The remaining conventional indices, with the exception of phi-over-phi-max, also correlated well with the IIF. Theoretical explanations for these relationships are offered.  相似文献   

Four item response theory (IRT) models were compared using data from tests where multiple items were grouped into testlets focused on a common stimulus. In the bi-factor model each item was treated as a function of a primary trait plus a nuisance trait due to the testlet; in the testlet-effects model the slopes in the direction of the testlet traits were constrained within each testlet to be proportional to the slope in the direction of the primary trait; in the polytomous model the item scores were summed into a single score for each testlet; and in the independent-items model the testlet structure was ignored. Using the simulated data, reliability was overestimated somewhat by the independent-items model when the items were not independent within testlets. Under these nonindependent conditions, the independent-items model also yielded greater root mean square error (RMSE) for item difficulty and underestimated the item slopes. When the items within testlets were instead generated to be independent, the bi-factor model yielded somewhat higher RMSE in difficulty and slope. Similar differences between the models were illustrated with real data.  相似文献   

非参数项目反应理论模型包括单调均匀性模型和双单调模型。用单调均匀性模型对某英语听力考试结果研究发现,按照顺序选择法,可从16道听力试题中选出11道满足要求的试题,组成单维量表。用考生在这11道试题上的总得分对考生进行排序与按照潜质排序等效。利用双单调模型对11道听力试题组成的单维量表进行试题功能偏差研究发现,有5道试题在女生子群体中的排序与在男生子群体以及整个群体排序不同,显示女生子群体作出正确应答的概率明显高于男生子群体作出正确应答的概率。这种差异至少部分是由两个子群体听力能力上的差异引起的。  相似文献   

基于项目反应理论的自适应考试系统   总被引:1,自引:0,他引:1  
计算机自适应考试是建构在项目反应理论基础上的一种考试方式,它能根据考生答题的情况不断计算受试者的能力值及信息量,并实时地根据这些参数调整出题策略,使用计算机自适应考试可以更真实地反映受试者的水平和特点。  相似文献   

加强标准化题库建设是检测教学效果的必然要求,也是大学物理课程建设和教学改革的需要。运用项目反应理论开发题库,可以弥补基于经典测试理论的题库的不足,提高测量精度,缩短测验长度,使试题系数更加规范,保证试题的科学性和有效性。  相似文献   

《Educational Assessment》2013,18(4):329-347
It is generally accepted that variability in performance will increase throughout Grades 1 to 12. Those with minimal knowledge of a domain should vary but little, but, as learning rates differ, variability should increase as a function of growth. In this article, the series of reading tests from a widely used test battery for Grades 1 through 12 was singled out for study as the scale scores for the series have the opposite characteristic-that is, variability is greatest at Grade 1 and decreases as growth proceeds. Item response theory (IRT) scaling was used; in previous editions, the publisher had used Thurstonian scaling and the variance increased with growth. Using data with known characteristics (i.e., weight distributions for ages 6 through 17), a comparison was made between the effectiveness of IRT and Thurstonian scaling procedures. The Thurstonian scaling more accurately reproduced the characteristics of the known distributions. As IRT scaling was shown to improve when perfect scores were included in the analyses and when items were selected whose difficulties reflected the entire range of ability, these steps were recommended. However, even when these steps were implemented with IRT, the Thurstonian scaling was still found to be more accurate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号