首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
以概化理论和项目反应理论为代表的现代测验理论是在克服经典测验理论缺陷的基础上产生的。概化理论是在经典测验理论的基础上,引入实验设计和方差分析技术,对测评情境中的各类误差进行分解和控制的一种现代测量理论,其发展主要经历了一元概化理论和多元概化理论两个阶段。目前,其应用主要集中在评价、考试和评定量表编制三个领域。项目反应理论是在克服经典测验理论题目参数等指标的变异性基础上发展起来的一种现代测验理论,其发展经历了早期理论探索、理论初步形成和理论逐渐完善三个阶段。它主要用于处理分数等值和测验项目参数、测验和项目的质量的分析,剥离测验情境中评委特征对测验结果的影响,以及测查项目功能差异、编制适应性测验等。  相似文献   

2.
随着计算机的普及、网络的发展、教学和考试测评理论的更新,一种基于题目反应理论的计算机自适应考试已经越来越普及,它以其题目适应不同能力学生水平自动变化的特点,已经被越来越多的考试所采用,针对题目反应理论,需要对自适应考试实现等问题加以论述。  相似文献   

3.
在美国,各个考试公司都会用不同的统计方法来检测考试中的舞弊现象。本文研究了两个检测舞弊的指数:基于经典考试理论的g2指数和基于项目反应理论的w指数。文章模拟了四种真实测试情形中常见的抄袭模式和几个可能影响指数的变量,研究结果表明,对于g2和w指数,在各种情形下,按照有偏差的估计参数以及真实参数计算出来的第一类错误率都是类似的,并且较低。因此,用有偏差的估计参数来计算g2和w指数不会增加将被抄袭者误认为抄袭者的可能性。而基于有偏差的估计参数的g2和w指数,只有在抄袭题目百分比较高且测试长度较长的情况下,才可能实现较低的第二类错误率。当抄袭题目百分比较低时,即便使用真实参数,g2和w指数都会造成较高的第二类错误率。  相似文献   

4.
介绍了在网络考试系统中,经典测试理论(CCT)和项目反应理论(IRT)的优缺点,项目反应理论(IRT)理论模型及其应用,可以实现网络考试系统在组合试卷题目时,全面解决考试等值问题及参数估计更为准确等问题,使得考试系统能更科学地挑选试题。  相似文献   

5.
黄丹媚 《考试周刊》2007,(33):146-147
本文主要从理论基础、题目分析和误差估计三方面对经典测验理论与项目反应理论的异同作一比较,并提出现阶段这两大测量理论仍将互补长短,共存发展。  相似文献   

6.
项目反应理论(Item Response Theory,IRT)又称题目反应理论、潜在特质理论,是在反对和克服经典测验理论(CTT)的不足之中发展起来的一种现代测量理论。  相似文献   

7.
应用项目反应理论等值含有多种题型考试的一个实例   总被引:2,自引:2,他引:2  
本文以美国一个州的高中统考为例介绍应用项目反应理论来对含有多种题型的考试进行等值处理的具体做法,同时也对考试的其他技术环节进行了一些探讨。  相似文献   

8.
中小学教师资格考试测试结果的统计分析内容包括各类考生群体的通过情况,基于经典测量理论的试卷分析、试题分析、项目功能差异分析,基于项目反应理论的试题参数分析。下一步应加强教师资格考试的效度研究,加强考试能力结构的分析,加强项目反应理论在教师资格考试题库建设以及未来计算机自适应考试中的应用。  相似文献   

9.
基于项目反应理论的计算机自适应考试是一种新型的考试形式,文中阐述了项目反应理论的基本原理,在.NET平台上以VB.NET为编程语言结合ADO.NET和SQL Server,设计并开发了一个功能比较完善的,基于W eb的自适应考试系统.该系统实现了网络化的计算机自适应考试以及较完善的题库、考务管理等辅助功能.  相似文献   

10.
运用径向基网络估计项目反应模型参数的研究   总被引:2,自引:0,他引:2  
项目反应理论是教育和心理测验的重要理论,它提出了考生对于题目做出反应的模型及其参数估计方法,但是这些参数估计方法都是建立在数理统计基础上的,都需要比较大的考生样本,而且仅适用于二值记分和等级记分项目,对于小样本考生和连续记分项目,缺乏合适的参数估计方法。笔者曾提出运用人工神经网络(即联结主义)进行参数估计的方法,它是一种全新的参数估计方法,可以对小样本和连续记分项目进行参数进行估计。  相似文献   

11.
In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change, test level, test content, and item format. As a follow-up to the real data analysis, a simulation study was performed to assess the effect of item position change on equating. Results from this study indicate that item position change significantly affects change in RID. In addition, although the test construction procedures used in the investigated state seem to somewhat mitigate the impact of item position change, equating results might be impacted in testing programs where other test construction practices or equating methods are utilized.  相似文献   

12.
The effect of item parameters (discrimination, difficulty, and level of guessing) on the item-fit statistic was investigated using simulated dichotomous data. Nine tests were simulated using 1,000 persons, 50 items, three levels of item discrimination, three levels of item difficulty, and three levels of guessing. The item fit was estimated using two fit statistics: the likelihood ratio statistic (X2B), and the standardized residuals (SRs). All the item parameters were simulated to be normally distributed. Results showed that the levels of item discrimination and guessing affected the item-fit values. As the level of item discrimination or guessing increased, item-fit values increased and more items misfit the model. The level of item difficulty did not affect the item-fit statistic.  相似文献   

13.
News Item     
《欧洲特需教育杂志》2013,28(2):228-231
  相似文献   

14.
Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent calibration, (b) separate calibration with one linking, and (c) separate calibration with three sequential linking. Evaluation across varying sample sizes and item pool sizes suggests that calibrating an item pool simultaneously results in the most stable scaling. The separate calibration with linking procedures produced larger scaling errors as the number of linking steps increased. The Haebara’s item characteristic curve linking resulted in better performances than the test characteristic curve (TCC) linking method. The present article provides an analytic illustration that the test characteristic curve method may fail to find global solutions in polytomous items. Finally, comparison of the single- and mixed-format item pools suggests that the use of polytomous items as the anchor can improve the overall scaling accuracy of the item pools.  相似文献   

15.
Computer packages that assist the test developer in writing items, generating tests, and building item banks are critically examined. There appears to be a lack of fully integrated software packages for item writing. Although there are many test generators, they do not really assist the test developer in checking the wording of items. Packages are available, however, for building quality item banks.  相似文献   

16.
Many researchers have suggested that the main cause of item bias is the misspecification of the latent ability space, where items that measure multiple abilities are scored as though they are measuring a single ability. If two different groups of examinees have different underlying multidimensional ability distributions and the test items are capable of discriminating among levels of abilities on these multiple dimensions, then any unidimensional scoring scheme has the potential to produce item bias. It is the purpose of this article to provide the testing practitioner with insight about the difference between item bias and item impact and how they relate to item validity. These concepts will be explained from a multidimensional item response theory (MIRT) perspective. Two detection procedures, the Mantel-Haenszel (as modified by Holland and Thayer, 1988) and Shealy and Stout's Simultaneous Item Bias (SIB; 1991) strategies, will be used to illustrate how practitioners can detect item bias.  相似文献   

17.
What are item sets and how are they being used in testing? What methods are being used to score them? What are the prospects for future use of item sets?  相似文献   

18.
Differential linear drift of item location parameters over a 10 -year period is demonstrated in data from the College Board Physics Achievement Test. The relative direction of drift is associated with the content of the items and reflects changing emphasis in the physics curricula of American secondary schools. No evidence of drift of discriminating power parameters was found. Statistical procedures for detecting, estimating, and accounting for item parameter drift in item pools for long-term testing programs are proposed  相似文献   

19.
测验项目编制与等值的一种有效策略——层面理论   总被引:2,自引:0,他引:2  
回转翻译法关注的是“文字等价”,项目反应理论注重“统计指标等价”。层面理论项目等价注重项目的同一测量目标,即等值的项目应该在相同的条件下测试被试相同的反应。层面理论通过映射语句技术清晰地界定项目的测量目标,使得项目等值与项目编制更加科学。通过层面理论编制的项目维度结构更加清楚,测验的结构效度更有保证。将层面理论和心理计量学的其他方法结合起来,可以有效提高测验项目编制与等值的质量。  相似文献   

20.
叶萌 《考试研究》2010,(2):96-107
本文对项目反应理论(IRT)局部独立性问题的主要研究成果进行了文献梳理。在此基础上,阐释局部独立性假设的定义。文章同时就局部独立性与测验维度的关系,局部依赖的甄别与计算、起因和控制程序,以及局部依赖对测量实践的影响进行讨论,并探讨了题组中局部题目依赖问题的解决策略。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号