首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Regular use of questions previously made available to the public (i.e., disclosed items) may provide one way to meet the requirement for large numbers of questions in a continuous testing environment, that is, an environment in which testing is offered at test taker convenience throughout the year rather than on a few prespecified test dates. First it must be shown that such use has effects on test scores small enough to be acceptable. In this study simulations are used to explore the use of disclosed items under a worst-case scenario which assumes that disclosed items are always answered correctly. Some item pool and test designs were identified in which the use of disclosed items produces effects on test scores that may be viewed as negligible.  相似文献   

2.
This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for students with disabilities that have typically taken alternate assessments based on modified achievement standards (AA-MAS). A simulation study indicated that the abilities of AA-MAS students can be underestimated or overestimated by the mixed-item CAT, depending on students’ location on the underlying ability scale. These findings held across grade levels and test lengths. The mixed-item CAT appeared to function well for non-AA-MAS students.  相似文献   

3.
During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. The traditional method of attaining the highest efficiency in ability estimation is to select items of maximum Fisher information at the currently estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies with simulated data addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with Sympson and Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order (i.e., less discriminating items first), as described in Chang and Ying's (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods, particularly for operational item pools when retired items cannot be totally replenished with similar highly discriminating items. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure, which can successfully even out the usage of all items.  相似文献   

4.
The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within-person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson-Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision.  相似文献   

5.
《教育实用测度》2013,26(4):359-375
Many procedures have been developed for selecting the "best" items for a computerized adaptive test. There is a trend toward the use of adaptive testing in applied settings such as licensure tests, program entrance tests, and educational tests. It is useful to consider procedures for item selection and the special needs of applied testing settings to facilitate test design. The current study reviews several classical approaches and alternative approaches to item selection and discusses their relative merit. This study also describes procedures for constrained computerized adaptive testing (C-CAT) that may be added to classical item selection approaches to allow them to be used for applied testing, while maintaining the high measurement precision and short test length that made adaptive testing attractive to practitioners initially.  相似文献   

6.
《教育实用测度》2013,26(4):287-304
Computerized adaptive testing, although well-grounded in psychometric theory, has had few large-scale applications in the past. This is now changing because the cost of computing has declined rapidly. As is always true at such junctures where theory is translated into practice, many practical issues arise that must now be addressed. In this article, we discuss a number of such issues and sketch out potential problems and potential solutions. Our purpose is to encourage further development of solutions to the issues presented as well as other practical issues facing measurement professionals involved with the implementation of adaptive testing.  相似文献   

7.
The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process.  相似文献   

8.
How does the use of computerized adaptive testing affect the performance of students from different groups? How consistent were the results of computerized adaptive and “conventional” tests? What did the students think about the test experience? What advice do the authors have for test developers and users?  相似文献   

9.
The computerized adaptive testing (CAT) has unsurpassable advantages over the traditional testing. It has become the mainstream in large scale examination in modem society. This paper gives a brief introduction to CAT including differences between traditional testing and CAT, the principals of CAT works, Psychometric theory and computer algorithms of CAT, the advantages and cautions of CAT. In the end, the development of CAT in China is reviewed.  相似文献   

10.
Simulations of computerized adaptive tests (CATs) were used to evaluate results yielded by four commonly used ability estimation methods: maximum likelihood estimation (MLE) and three Bayesian approaches—Owen's method, expected a posteriori (EAP), and maximum a posteriori. In line with the theoretical nature of the ability estimates and previous empirical research, the results showed clear distinctions between MLE and the Bayesian methods, with MLE yielding lower bias, higher standard errors, higher root mean square errors, lower fidelity, and lower administrative efficiency. Standard errors for MLE based on test information underestimated actual standard errors, whereas standard errors for the Bayesian methods based on posterior distribution standard deviations accurately estimated actual standard errors. Among the Bayesian methods, Owen's provided the worst overall results, and EAP provided the best. Using a variable starting rule in which examinees were initially classified into three broad/ability groups greatly reduced the bias for the Bayesian methods, but had little effect on the results for MLE. On the basis of these results, guidelines are offered for selecting appropriate CAT ability estimation methods in different decision contexts.  相似文献   

11.
樊军 《考试研究》2012,(4):61-67
计算机自适应性测试中的连续概率比例试模式,是一种适用于普通教师利用网络技术在班级教学这样的小规模测试中评估学生语言学习效果的测试模式。其基本原理就是估计被试连续测试时答对与答错的概率,然后与“掌握”和“未掌握”两个相互对立的假设作比较而产生相应的决策。它一方面可以弥补基于IRT测试模式应用范围的不足,_另一方面可以更好地帮助教师完成对于学生语言能力的评估。  相似文献   

12.
本文结合专家经验确定法和项目反应理论,设计出一种简明、实用的计算机自适应考试系统的试题难度确定方法,同时重点分析计算机自适应考试系统的测试起点、终点选择,选题策略和能力值估计方法。最后列举了一个自适应测试的步骤实例。本系统能够根据不同能力被试者随机选择试题项目,减少了测试长度,与传统在线考试系统相比提高了考试效率。  相似文献   

13.
The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant.  相似文献   

14.
介绍了项目反应理论(IRT)的基本理论和计算机化自适应测试(CAT)的实现过程。并在Visual Stu-dio.net2003的环境下,以SQL作为后台数据库,以三参数Logistic模型为项目反应模型,开发了一个基于WEB的CAT系统。  相似文献   

15.
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998 Armstrong, R. D., Jones, D. H., Berliner, N. and Pashley, P. June 1998. Computerized adaptive tests with multiple form structures, June, Champaign-Urbana, IL: Paper presented at the annual meeting of the Psychometric Society.  [Google Scholar]). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising that block may be unrelated to each other, or they may comprise a testlet (Wainer and Kiely, 1987 Wainer, H. and Kiely, G. L. 1987. Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24: 185201. [Crossref], [Web of Science ®] [Google Scholar]) After the first block of items has been administered, adaptation takes place in the choice of the next block to be administered and subsequent blocks. The uMFS design integrates item exposure control, as well as content balancing and other test development needs, into the design of the CAT, instead of placing those activities in the online implementation. We show that it is possible to implement item exposure control, in a very thorough way, in the psychometric specifications of the item blocks.  相似文献   

16.
基于认知诊断自适应测试(CD-CAT)的教育测量技术能够为学生个性化学习提供帮助,有助于做到因材施教。目前我国已开展基于CD-CAT教育辅助系统的开发和使用,但与其他国家和地区相比较仍有差距。扩大教育测量专业人员队伍,加强CD-CAT在理论上的创新研究、在实践上的应用,开发更加适合个人、更加开放灵活的智能学习系统是我国教育测量的未来发展方向。  相似文献   

17.
Two new methods for item exposure control were proposed. In the Progressive method, as the test progresses, the influence of a random component on item selection is reduced and the importance of item information is increasingly more prominent. In the Restricted Maximum Information method, no item is allowed to be exposed in more than a predetermined proportion of tests. Both methods were compared with six other item-selection methods (Maximum Information, One Parameter, McBride and Martin, Randomesque, Sympson and Hetter, and Random Item Selection) with regard to test precision and item exposure variables. Results showed that the Restricted method was useful to reduce maximum exposure rates and that the Progressive method reduced the number of unused items. Both did well regarding precision. Thus, a combined Progressive-Restricted method may be useful to control item exposure without a serious decrease in test precision.  相似文献   

18.
三、CAT中对的估计(一)MLE(极大似然估计法)假设一个能力水平为θ的被试对n道项目X_1,X_2,…,X_n作答。θ的估计可以通过使(8)式所示的似然函数最大化的方式来得到。令(?)_n为此时所得的θ估计。显然(?)_n也是(9)式的极大似然估计。已知在一定的条件下,(?)_n符合渐进正态,其均值为θ,方差近似为I~(-1)_n((?)_n)。目前的CAT设计大多通过递归方式在被试回答一个新的项目之后得到最新的θ估计,并根据信息最大化法抽取下一个项目。  相似文献   

19.
This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback‐Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson‐Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.  相似文献   

20.
近年来由于信息技术的进步,采用计算机自适应测试进行评价得到迅速的发展;此外,移动技术的可用性也为评价提供了新的途径。文章设计并开发了面向多类终端的自适应测试系统,在项目选择过程中充分考虑了已有算法所存在的部分项目曝光率高、题库利用率低、内容平衡等问题,重新设计了项目选择引擎。通过该系统可以为形成性评估、总结性评估和自我评估提供支持。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号