期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Comparison of Item Exposure Control Methods in Computerized Adaptive Testing

Javier Revuelta Vicente Ponsoda 《Journal of Educational Measurement》1998,35(4):311-327

Two new methods for item exposure control were proposed. In the Progressive method, as the test progresses, the influence of a random component on item selection is reduced and the importance of item information is increasingly more prominent. In the Restricted Maximum Information method, no item is allowed to be exposed in more than a predetermined proportion of tests. Both methods were compared with six other item-selection methods (Maximum Information, One Parameter, McBride and Martin, Randomesque, Sympson and Hetter, and Random Item Selection) with regard to test precision and item exposure variables. Results showed that the Restricted method was useful to reduce maximum exposure rates and that the Progressive method reduced the number of unused items. Both did well regarding precision. Thus, a combined Progressive-Restricted method may be useful to control item exposure without a serious decrease in test precision. 相似文献

2.

CAT模拟结果的分析模式与评价指标

《中国考试》2016,(12)

计算机化自适应测验(CAT)模拟是CAT研究的主要方法之一。CAT模拟结果的评价分析内容主要包括三个方面:被试能力估计与被试能力分类分析、题库试题使用情况分析和CAT测验作答过程分析。CAT模拟结果的分析模式主要分为整体分析和细化分析两种模式。本研究从测验模拟返真性能、测验准确性、题库安全性、题库使用率、测验分类效率与准确性、多测验目标约束控制的实现程度等角度概述CAT模拟结果的各类评价指标。CAT模拟结果的评价角度和评价指标需要根据CAT研究目标和测验情境要求加以确定。相似文献

3.

计算机化自适应测验模拟方法的研究范式与特点

《中国考试》2016,(1)

计算机化自适应测验(CAT)在理论与实践中得到广泛应用。目前许多CAT研究可以归纳为两种研究范式:实测作答的CAT研究范式和测验作答数据模拟的CAT研究范式。CAT模拟研究方法的步骤有模型选择、题库模拟、测试起点、选题策略、测验终止策略等。CAT模拟研究的主要趋势有:选题策略、终止策略仍然是CAT研究的重点;CAT模拟研究的设计内容更适合实际测验情况;CAT研究设计采取多因素设计;模拟结果多方面综合评价等。相似文献

4.

A Comparison of Self-Adapted and Computerized Adaptive Tests

Steven L. Wise Barbara S. Plake Phillip L. Johnson Linda L. Roos 《Journal of Educational Measurement》1992,29(4):329-339

According to item response theory (IRT), examinee ability estimation is independent of the particular set of test items administered from a calibrated pool. Although the most popular application of this feature of IRT is computerized adaptive (CA) testing, a recently proposed alternative is self-adapted (SA) testing, in which examinees choose the difficulty level of each of their test items. This study compared examinee performance under SA and CA tests, finding that examinees taking the SA test (a) obtained significantly higher ability scores and (b) reported significantly lower posttest state anxiety. The results of this study suggest that SA testing is a desirable format for computer-based testing. 相似文献

5.

A New Stopping Rule for Computerized Adaptive Testing

Choi SW Grady MW Dodd BG 《Educational and psychological measurement》2010,70(6):1-17

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. 相似文献

6.

Computerized Adaptive Testing With Different Groups

Sue M. Legg Dianne C. Buhr 《Educational Measurement》1992,11(2):23-27

How does the use of computerized adaptive testing affect the performance of students from different groups? How consistent were the results of computerized adaptive and “conventional” tests? What did the students think about the test experience? What advice do the authors have for test developers and users? 相似文献

7.

Flawed Items in Computerized Adaptive Testing

Maria T. Potenza Martha L. Stocking 《Journal of Educational Measurement》1997,34(1):79-96

Educational Testing Service A multiple-choice test item is identified as flawed if it has no single best answer. In spite of extensive quality control procedures, the administration of flawed items to test takers is inevitable. A limited set of common strategies for dealing with flawed items in conventional testing, grounded in the principle of fairness to examinees, is reexamined in the context of adaptive testing. An additional strategy, available for adaptive testing, of retesting from a pool cleansed of flawed items, is compared to the existing strategies. Retesting was found to be no practical improvement over current strategies. 相似文献

8.

计算机自适应考试系统研究

吕岚《晋城职业技术学院学报》2013,6(4):56-59

本文结合专家经验确定法和项目反应理论,设计出一种简明、实用的计算机自适应考试系统的试题难度确定方法,同时重点分析计算机自适应考试系统的测试起点、终点选择,选题策略和能力值估计方法。最后列举了一个自适应测试的步骤实例。本系统能够根据不同能力被试者随机选择试题项目,减少了测试长度,与传统在线考试系统相比提高了考试效率。相似文献

9.

Item Clusters and Computerized Adaptive Testing: A Case for Testlets

Howard Wainer Gerard L. Kiely 《Journal of Educational Measurement》1987,24(3):185-201

It is observed that many sorts of difficulties may preclude the uneventful construction of tests by a computerized algorithm, such as those currently in favor in Computerized Adaptive Testing (CAT). In this essay we discuss a number of these problems, as well as some possible avenues of solution. We conclude with the development of the "testlet," a bundle of items that can be arranged either hierarchically or linearly, thus maintaining the efficiency of an adaptive test while keeping the quality control of test construction that is possible currently only with careful expert scrutiny. Performance on the separate testlets is aggregated to yield ability estimates. 相似文献

10.

Practical issues in Large-Scale Computerized Adaptive Testing

《教育实用测度》2013,26(4):287-304

Computerized adaptive testing, although well-grounded in psychometric theory, has had few large-scale applications in the past. This is now changing because the cost of computing has declined rapidly. As is always true at such junctures where theory is translated into practice, many practical issues arise that must now be addressed. In this article, we discuss a number of such issues and sketch out potential problems and potential solutions. Our purpose is to encourage further development of solutions to the issues presented as well as other practical issues facing measurement professionals involved with the implementation of adaptive testing. 相似文献

11.

An Introduction to the Computerized Adaptive Testing

TIAN Jian-quan MIAO Dan-min ZHU Xia GONG Jing-jing 《美中教育评论》2007,4(1):72-81

The computerized adaptive testing （CAT） has unsurpassable advantages over the traditional testing. It has become the mainstream in large scale examination in modem society. This paper gives a brief introduction to CAT including differences between traditional testing and CAT, the principals of CAT works, Psychometric theory and computer algorithms of CAT, the advantages and cautions of CAT. In the end, the development of CAT in China is reviewed. 相似文献

12.

Evaluating Content Alignment in Computerized Adaptive Testing

Steven L. Wise G. Gage Kingsbury Norman L. Webb 《Educational Measurement》2015,34(4):41-48

The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process. 相似文献

13.

认知诊断自适应测试的应用与展望

《中国考试》2021,(1)

基于认知诊断自适应测试(CD-CAT)的教育测量技术能够为学生个性化学习提供帮助,有助于做到因材施教。目前我国已开展基于CD-CAT教育辅助系统的开发和使用,但与其他国家和地区相比较仍有差距。扩大教育测量专业人员队伍,加强CD-CAT在理论上的创新研究、在实践上的应用,开发更加适合个人、更加开放灵活的智能学习系统是我国教育测量的未来发展方向。相似文献

14.

基于WEB的计算机自适应考试系统的设计与实现

刘发明《赣南师范学院学报》2005,26(6):64-66

介绍了项目反应理论(IRT)的基本理论和计算机化自适应测试(CAT)的实现过程。并在Visual Stu-dio.net2003的环境下,以SQL作为后台数据库,以三参数Logistic模型为项目反应模型,开发了一个基于WEB的CAT系统。相似文献

15.

Multistage Computerized Adaptive Testing With Uniform Item Exposure

Michael C. Edwards David B. Flora David Thissen 《教育实用测度》2013,26(2):118-141

This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998 Armstrong, R. D., Jones, D. H., Berliner, N. and Pashley, P. June 1998. Computerized adaptive tests with multiple form structures, June, Champaign-Urbana, IL: Paper presented at the annual meeting of the Psychometric Society. [Google Scholar]). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising that block may be unrelated to each other, or they may comprise a testlet (Wainer and Kiely, 1987 Wainer, H. and Kiely, G. L. 1987. Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24: 185–201. [Crossref], [Web of Science ®] , [Google Scholar]) After the first block of items has been administered, adaptation takes place in the choice of the next block to be administered and subsequent blocks. The uMFS design integrates item exposure control, as well as content balancing and other test development needs, into the design of the CAT, instead of placing those activities in the online implementation. We show that it is possible to implement item exposure control, in a very thorough way, in the psychometric specifications of the item blocks. 相似文献

16.

A Comparative Study of Item Exposure Control Methods in Computerized Adaptive Testing

Shun-Wen Chang Timothy N. Ansley 《Journal of Educational Measurement》2003,40(1):71-103

This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The Stocking and Lewis conditional multinomial procedure and, to a slightly lesser extent, the Davey and Parshall method seemed to be the most promising considering all of the factors that this study addressed. 相似文献

17.

Computerized Adaptive and Fixed-Item Testing of Music Listening Skill: A Comparison of Efficiency, Precision, and Concurrent Validity

Walter P. Vispoel Tianyou Wang Timothy Bleiler 《Journal of Educational Measurement》1997,34(1):43-63

We evaluated the efficiency, precision, and concurrent validity of results obtained from adaptive and fired-item music listening tests in three studies: (a) a computer simulation study in which each of 2,200 simulees completed a computerized adaptive tonal memory test, a computerized fired-item tonal memory test constructed from items in the adaptive test pool and two standardized group-administered tonal memory tests; (b) a live testing study in which each of 204 examinees took the computerized adaptive test and the standardized tests; and (c) a live testing study in which randomly equivalent groups took either the computerized adaptive test (n = 86) or the computerized fired-item test (n = 86). The adaptive music test required 50% to 93% fewer items to match the reliability and concurrent validity of the fired-item tests, and it yielded higher levels of reliability and concurrent validity than the fired-item tests when test length was held constant. These findings suggest that computerized adaptive tests, which typically have been limited to visually produced items, may also be well suited for measuring skills that require aurally produced items. 相似文献

18.

面向多类终端的计算机自适应测试系统的设计与实现

路鹏周东岱钟绍春丛晓《现代教育技术》2012,22(6):88-92

近年来由于信息技术的进步,采用计算机自适应测试进行评价得到迅速的发展;此外,移动技术的可用性也为评价提供了新的途径。文章设计并开发了面向多类终端的自适应测试系统,在项目选择过程中充分考虑了已有算法所存在的部分项目曝光率高、题库利用率低、内容平衡等问题,重新设计了项目选择引擎。通过该系统可以为形成性评估、总结性评估和自我评估提供支持。相似文献

19.

计算机自适应测验的研究策略与应用实践

《现代教育技术》2017,(12):44-49

计算机自适应测验(Computerized Adaptive Testing,CAT)有着传统纸笔测验无法比拟的优势,其研究和应用吸引了教育领域专家的关注。但国内多数研究致力于纯粹的学术探讨,缺少对教育实践者的普适性指导。基于此,文章从实时性和科普性的角度出发,阐述了常用的CAT研究策略和应用软件,并进一步介绍了CAT在美国基础教育和特殊教育领域的实践应用。文章旨在拉近CAT研究与教育实践工作者间的距离,使其能清晰把握当前CAT的研究脉络和应用前景,促进CAT在教育领域的推广和应用。相似文献

20.

Properties of Ability Estimation Methods in Computerized Adaptive Testing

Tianyou Wang Walter P. Vispoel 《Journal of Educational Measurement》1998,35(2):109-135

Simulations of computerized adaptive tests (CATs) were used to evaluate results yielded by four commonly used ability estimation methods: maximum likelihood estimation (MLE) and three Bayesian approaches—Owen's method, expected a posteriori (EAP), and maximum a posteriori. In line with the theoretical nature of the ability estimates and previous empirical research, the results showed clear distinctions between MLE and the Bayesian methods, with MLE yielding lower bias, higher standard errors, higher root mean square errors, lower fidelity, and lower administrative efficiency. Standard errors for MLE based on test information underestimated actual standard errors, whereas standard errors for the Bayesian methods based on posterior distribution standard deviations accurately estimated actual standard errors. Among the Bayesian methods, Owen's provided the worst overall results, and EAP provided the best. Using a variable starting rule in which examinees were initially classified into three broad/ability groups greatly reduced the bias for the Bayesian methods, but had little effect on the results for MLE. On the basis of these results, guidelines are offered for selecting appropriate CAT ability estimation methods in different decision contexts. 相似文献