共查询到20条相似文献,搜索用时 0 毫秒
1.
Two new methods for item exposure control were proposed. In the Progressive method, as the test progresses, the influence of a random component on item selection is reduced and the importance of item information is increasingly more prominent. In the Restricted Maximum Information method, no item is allowed to be exposed in more than a predetermined proportion of tests. Both methods were compared with six other item-selection methods (Maximum Information, One Parameter, McBride and Martin, Randomesque, Sympson and Hetter, and Random Item Selection) with regard to test precision and item exposure variables. Results showed that the Restricted method was useful to reduce maximum exposure rates and that the Progressive method reduced the number of unused items. Both did well regarding precision. Thus, a combined Progressive-Restricted method may be useful to control item exposure without a serious decrease in test precision. 相似文献
2.
3.
4.
Steven L. Wise Barbara S. Plake Phillip L. Johnson Linda L. Roos 《Journal of Educational Measurement》1992,29(4):329-339
According to item response theory (IRT), examinee ability estimation is independent of the particular set of test items administered from a calibrated pool. Although the most popular application of this feature of IRT is computerized adaptive (CA) testing, a recently proposed alternative is self-adapted (SA) testing, in which examinees choose the difficulty level of each of their test items. This study compared examinee performance under SA and CA tests, finding that examinees taking the SA test (a) obtained significantly higher ability scores and (b) reported significantly lower posttest state anxiety. The results of this study suggest that SA testing is a desirable format for computer-based testing. 相似文献
5.
The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. 相似文献
6.
How does the use of computerized adaptive testing affect the performance of students from different groups? How consistent were the results of computerized adaptive and “conventional” tests? What did the students think about the test experience? What advice do the authors have for test developers and users? 相似文献
7.
Educational Testing Service A multiple-choice test item is identified as flawed if it has no single best answer. In spite of extensive quality control procedures, the administration of flawed items to test takers is inevitable. A limited set of common strategies for dealing with flawed items in conventional testing, grounded in the principle of fairness to examinees, is reexamined in the context of adaptive testing. An additional strategy, available for adaptive testing, of retesting from a pool cleansed of flawed items, is compared to the existing strategies. Retesting was found to be no practical improvement over current strategies. 相似文献
8.
吕岚 《晋城职业技术学院学报》2013,6(4):56-59
本文结合专家经验确定法和项目反应理论,设计出一种简明、实用的计算机自适应考试系统的试题难度确定方法,同时重点分析计算机自适应考试系统的测试起点、终点选择,选题策略和能力值估计方法。最后列举了一个自适应测试的步骤实例。本系统能够根据不同能力被试者随机选择试题项目,减少了测试长度,与传统在线考试系统相比提高了考试效率。 相似文献
9.
It is observed that many sorts of difficulties may preclude the uneventful construction of tests by a computerized algorithm, such as those currently in favor in Computerized Adaptive Testing (CAT). In this essay we discuss a number of these problems, as well as some possible avenues of solution. We conclude with the development of the "testlet," a bundle of items that can be arranged either hierarchically or linearly, thus maintaining the efficiency of an adaptive test while keeping the quality control of test construction that is possible currently only with careful expert scrutiny. Performance on the separate testlets is aggregated to yield ability estimates. 相似文献
10.
《教育实用测度》2013,26(4):287-304
Computerized adaptive testing, although well-grounded in psychometric theory, has had few large-scale applications in the past. This is now changing because the cost of computing has declined rapidly. As is always true at such junctures where theory is translated into practice, many practical issues arise that must now be addressed. In this article, we discuss a number of such issues and sketch out potential problems and potential solutions. Our purpose is to encourage further development of solutions to the issues presented as well as other practical issues facing measurement professionals involved with the implementation of adaptive testing. 相似文献
11.
TIAN Jian-quan MIAO Dan-min ZHU Xia GONG Jing-jing 《美中教育评论》2007,4(1):72-81
The computerized adaptive testing (CAT) has unsurpassable advantages over the traditional testing. It has become the mainstream in large scale examination in modem society. This paper gives a brief introduction to CAT including differences between traditional testing and CAT, the principals of CAT works, Psychometric theory and computer algorithms of CAT, the advantages and cautions of CAT. In the end, the development of CAT in China is reviewed. 相似文献
12.
The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process. 相似文献
13.
14.
刘发明 《赣南师范学院学报》2005,26(6):64-66
介绍了项目反应理论(IRT)的基本理论和计算机化自适应测试(CAT)的实现过程。并在Visual Stu-dio.net2003的环境下,以SQL作为后台数据库,以三参数Logistic模型为项目反应模型,开发了一个基于WEB的CAT系统。 相似文献
15.
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising that block may be unrelated to each other, or they may comprise a testlet (Wainer and Kiely, 1987) After the first block of items has been administered, adaptation takes place in the choice of the next block to be administered and subsequent blocks. The uMFS design integrates item exposure control, as well as content balancing and other test development needs, into the design of the CAT, instead of placing those activities in the online implementation. We show that it is possible to implement item exposure control, in a very thorough way, in the psychometric specifications of the item blocks. 相似文献
16.
This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The Stocking and Lewis conditional multinomial procedure and, to a slightly lesser extent, the Davey and Parshall method seemed to be the most promising considering all of the factors that this study addressed. 相似文献
17.
We evaluated the efficiency, precision, and concurrent validity of results obtained from adaptive and fired-item music listening tests in three studies: (a) a computer simulation study in which each of 2,200 simulees completed a computerized adaptive tonal memory test, a computerized fired-item tonal memory test constructed from items in the adaptive test pool and two standardized group-administered tonal memory tests; (b) a live testing study in which each of 204 examinees took the computerized adaptive test and the standardized tests; and (c) a live testing study in which randomly equivalent groups took either the computerized adaptive test (n = 86) or the computerized fired-item test (n = 86). The adaptive music test required 50% to 93% fewer items to match the reliability and concurrent validity of the fired-item tests, and it yielded higher levels of reliability and concurrent validity than the fired-item tests when test length was held constant. These findings suggest that computerized adaptive tests, which typically have been limited to visually produced items, may also be well suited for measuring skills that require aurally produced items. 相似文献
18.
19.
20.
Simulations of computerized adaptive tests (CATs) were used to evaluate results yielded by four commonly used ability estimation methods: maximum likelihood estimation (MLE) and three Bayesian approaches—Owen's method, expected a posteriori (EAP), and maximum a posteriori. In line with the theoretical nature of the ability estimates and previous empirical research, the results showed clear distinctions between MLE and the Bayesian methods, with MLE yielding lower bias, higher standard errors, higher root mean square errors, lower fidelity, and lower administrative efficiency. Standard errors for MLE based on test information underestimated actual standard errors, whereas standard errors for the Bayesian methods based on posterior distribution standard deviations accurately estimated actual standard errors. Among the Bayesian methods, Owen's provided the worst overall results, and EAP provided the best. Using a variable starting rule in which examinees were initially classified into three broad/ability groups greatly reduced the bias for the Bayesian methods, but had little effect on the results for MLE. On the basis of these results, guidelines are offered for selecting appropriate CAT ability estimation methods in different decision contexts. 相似文献