首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Simulations of computerized adaptive tests (CATs) were used to evaluate results yielded by four commonly used ability estimation methods: maximum likelihood estimation (MLE) and three Bayesian approaches—Owen's method, expected a posteriori (EAP), and maximum a posteriori. In line with the theoretical nature of the ability estimates and previous empirical research, the results showed clear distinctions between MLE and the Bayesian methods, with MLE yielding lower bias, higher standard errors, higher root mean square errors, lower fidelity, and lower administrative efficiency. Standard errors for MLE based on test information underestimated actual standard errors, whereas standard errors for the Bayesian methods based on posterior distribution standard deviations accurately estimated actual standard errors. Among the Bayesian methods, Owen's provided the worst overall results, and EAP provided the best. Using a variable starting rule in which examinees were initially classified into three broad/ability groups greatly reduced the bias for the Bayesian methods, but had little effect on the results for MLE. On the basis of these results, guidelines are offered for selecting appropriate CAT ability estimation methods in different decision contexts.  相似文献   

2.
The purpose of this study was to compare the effects of four item selection rules—(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)—with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage.  相似文献   

3.
Error indices (bias, standard error of estimation, and root mean squared error) obtained on different measurement scales under different test-termination rules in computerized adaptive testing (CAT) were examined. Four ability estimation methods (maximum likelihood estimation, weighted likelihood estimation, expected a posterior, and maximum a posterior), three measurement scales (θ, number-correct score, and ACT score), and three test-termination rules (fixed length, fixed standard error, and target information) were studied for a real and a generated item pool. The findings indicated that the amount and direction of bias, standard error of estimation, and root mean squared error obtained under different ability estimation methods were influenced both by scale transformations and by test-termination rules in a CAT environment. The implications of these effects for testing programs are discussed.  相似文献   

4.
How has Item Response Theory helped solve problems in the development and use of computer-adaptive tests? Do we need to balance item content with computer-adaptive tests? Could we use IRT to evaluate unusual responses to computer-delivered tests?  相似文献   

5.
It is observed that many sorts of difficulties may preclude the uneventful construction of tests by a computerized algorithm, such as those currently in favor in Computerized Adaptive Testing (CAT). In this essay we discuss a number of these problems, as well as some possible avenues of solution. We conclude with the development of the "testlet," a bundle of items that can be arranged either hierarchically or linearly, thus maintaining the efficiency of an adaptive test while keeping the quality control of test construction that is possible currently only with careful expert scrutiny. Performance on the separate testlets is aggregated to yield ability estimates.  相似文献   

6.
2PLM下CAT选题策略比较   总被引:1,自引:0,他引:1  
本文在两参数逻辑斯蒂克模型(2PLM)下,提出一种新的选题策略——平均测验难度匹配法(Avt—b),并对四种选题策略下EAP能力估计趋势进行比较研究。通过模拟研究显示,Avt—b方法在CAT前期能够较快地锁定能力范围,较准确地作出能力估计。本文对CAT测试阶段的能力误差范围进行确定,对于多级评分模型的CAT选题策略开发具有一定的借鉴意义。  相似文献   

7.
Two new methods for item exposure control were proposed. In the Progressive method, as the test progresses, the influence of a random component on item selection is reduced and the importance of item information is increasingly more prominent. In the Restricted Maximum Information method, no item is allowed to be exposed in more than a predetermined proportion of tests. Both methods were compared with six other item-selection methods (Maximum Information, One Parameter, McBride and Martin, Randomesque, Sympson and Hetter, and Random Item Selection) with regard to test precision and item exposure variables. Results showed that the Restricted method was useful to reduce maximum exposure rates and that the Progressive method reduced the number of unused items. Both did well regarding precision. Thus, a combined Progressive-Restricted method may be useful to control item exposure without a serious decrease in test precision.  相似文献   

8.
This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback‐Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson‐Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.  相似文献   

9.
Many standardized tests are now administered via computer rather than paper‐and‐pencil format. The computer‐based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second advantage is the ability to record not only the test taker's response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Combining these two advantages, various methods were explored for utilizing response time data in selecting appropriate items for an individual test taker. Four strategies for incorporating response time data were evaluated, and the precision of the final test‐taker score was assessed by comparing it to a benchmark value that did not take response time information into account. While differences in measurement precision and testing times were expected, results showed that the strategies did not differ much with respect to measurement precision but that there were differences with regard to the total testing time.  相似文献   

10.
This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The Stocking and Lewis conditional multinomial procedure and, to a slightly lesser extent, the Davey and Parshall method seemed to be the most promising considering all of the factors that this study addressed.  相似文献   

11.
The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (rmax). Therefore, item exposure control methods which implement a specification of rmax (e.g., Sympson & Hetter, 1985) provide the most direct control at both the item and test levels.  相似文献   

12.
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long‐term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a‐values while eliminating the need for item pool stratification.  相似文献   

13.
Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed.  相似文献   

14.
During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. The traditional method of attaining the highest efficiency in ability estimation is to select items of maximum Fisher information at the currently estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies with simulated data addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with Sympson and Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order (i.e., less discriminating items first), as described in Chang and Ying's (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods, particularly for operational item pools when retired items cannot be totally replenished with similar highly discriminating items. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure, which can successfully even out the usage of all items.  相似文献   

15.
项目反应理论(Item Response Theory,IRT)是在克服经典测验理论局限性的基础上发展起来的,在单维性、局部独立性和单调性的前提假设下,更具有优越性。它以潜在特质理论为基础,采用项目特征曲线假设对其进行建模,产生了三个最基本的模型:正态肩型曲线模型、拉希模型和逻辑斯蒂模型。指出了IRT在实际应用中取得实质性进展的两个方面:计算机自适应测验和认知诊断。  相似文献   

16.
The development of cognitive diagnostic‐computerized adaptive testing (CD‐CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual‐objective CD‐CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen‐Shannon (JS) divergence, a symmetrized version of the Kullback‐Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage.  相似文献   

17.
Practitioners typically face situations in which examinees have not responded to all test items. This study investigated the effect on an examinee's ability estimate when an examinee is presented an item, has ample time to answer, but decides not to respond to the item. Three approaches to ability estimation (biweight estimation, expected a posteriori, and maximum likelihood estimation) were examined. A Monte Carlo study was performed and the effect of different levels of omissions on the simulee's ability estimates was determined. Results showed that the worst estimation occurred when omits were treated as incorrect. In contrast, substitution of 0.5 for omitted responses resulted in ability estimates that were almost as accurate as those using complete data. Implications for practitioners are discussed.  相似文献   

18.
该文介绍并比较了计算机化自适应测验(computerized adaptive testing,CAT)环境中的MLE、WLE、MAP、EAP等几种常用能力估计方法的发展演变以及各自的原理与特性,并对这些能力估计方法的发展脉络及其特性做了简要总结与评价,最后展望了未来CAT中能力估计的发展趋势。  相似文献   

19.
Regular use of questions previously made available to the public (i.e., disclosed items) may provide one way to meet the requirement for large numbers of questions in a continuous testing environment, that is, an environment in which testing is offered at test taker convenience throughout the year rather than on a few prespecified test dates. First it must be shown that such use has effects on test scores small enough to be acceptable. In this study simulations are used to explore the use of disclosed items under a worst-case scenario which assumes that disclosed items are always answered correctly. Some item pool and test designs were identified in which the use of disclosed items produces effects on test scores that may be viewed as negligible.  相似文献   

20.
《教育实用测度》2013,26(3):241-261
This simulation study compared two procedures to enable an adaptive test to select items in correspondence with a content blueprint. Trait level estimates obtained from testlet-based and constrained adaptive tests administered to 10,000 simulated examinees under two trait distributions and three item pool sizes were compared to the trait level estimates obtained from traditional adaptive tests in terms of mean absolute error, bias, and information. Results indicate that using constrained adaptive testing requires an increase of 5% to 11% in test length over the traditional adaptive test to reach the same error level and, using testlets requires an increase of 43% to 104% in test length over the traditional adaptive test. Given these results, the use of constrained computerized adaptive testing is recommended for situations in which an adaptive test must adhere to particular content specifications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号