共查询到20条相似文献,搜索用时 15 毫秒
1.
Dual‐Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing 下载免费PDF全文
The development of cognitive diagnostic‐computerized adaptive testing (CD‐CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual‐objective CD‐CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen‐Shannon (JS) divergence, a symmetrized version of the Kullback‐Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. 相似文献
2.
Chia-Wen Chen Wen-Chung Wang Ming Ming Chiu Sage Ro 《Journal of Educational Measurement》2020,57(2):343-369
The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within-person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson-Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision. 相似文献
3.
4.
Two new methods for item exposure control were proposed. In the Progressive method, as the test progresses, the influence of a random component on item selection is reduced and the importance of item information is increasingly more prominent. In the Restricted Maximum Information method, no item is allowed to be exposed in more than a predetermined proportion of tests. Both methods were compared with six other item-selection methods (Maximum Information, One Parameter, McBride and Martin, Randomesque, Sympson and Hetter, and Random Item Selection) with regard to test precision and item exposure variables. Results showed that the Restricted method was useful to reduce maximum exposure rates and that the Progressive method reduced the number of unused items. Both did well regarding precision. Thus, a combined Progressive-Restricted method may be useful to control item exposure without a serious decrease in test precision. 相似文献
5.
2PLM下CAT选题策略比较 总被引:1,自引:0,他引:1
本文在两参数逻辑斯蒂克模型(2PLM)下,提出一种新的选题策略——平均测验难度匹配法(Avt—b),并对四种选题策略下EAP能力估计趋势进行比较研究。通过模拟研究显示,Avt—b方法在CAT前期能够较快地锁定能力范围,较准确地作出能力估计。本文对CAT测试阶段的能力误差范围进行确定,对于多级评分模型的CAT选题策略开发具有一定的借鉴意义。 相似文献
6.
Kyung T. Han 《Journal of Educational Measurement》2012,49(3):225-246
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long‐term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a‐values while eliminating the need for item pool stratification. 相似文献
7.
This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The Stocking and Lewis conditional multinomial procedure and, to a slightly lesser extent, the Davey and Parshall method seemed to be the most promising considering all of the factors that this study addressed. 相似文献
8.
Bernard P. Veldkamp 《Journal of Educational Measurement》2016,53(2):212-228
Many standardized tests are now administered via computer rather than paper‐and‐pencil format. The computer‐based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second advantage is the ability to record not only the test taker's response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Combining these two advantages, various methods were explored for utilizing response time data in selecting appropriate items for an individual test taker. Four strategies for incorporating response time data were evaluated, and the precision of the final test‐taker score was assessed by comparing it to a benchmark value that did not take response time information into account. While differences in measurement precision and testing times were expected, results showed that the strategies did not differ much with respect to measurement precision but that there were differences with regard to the total testing time. 相似文献
9.
Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. 相似文献
10.
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising that block may be unrelated to each other, or they may comprise a testlet (Wainer and Kiely, 1987) After the first block of items has been administered, adaptation takes place in the choice of the next block to be administered and subsequent blocks. The uMFS design integrates item exposure control, as well as content balancing and other test development needs, into the design of the CAT, instead of placing those activities in the online implementation. We show that it is possible to implement item exposure control, in a very thorough way, in the psychometric specifications of the item blocks. 相似文献
11.
During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. The traditional method of attaining the highest efficiency in ability estimation is to select items of maximum Fisher information at the currently estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies with simulated data addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with Sympson and Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order (i.e., less discriminating items first), as described in Chang and Ying's (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods, particularly for operational item pools when retired items cannot be totally replenished with similar highly discriminating items. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure, which can successfully even out the usage of all items. 相似文献
12.
Computerized Cognitive Diagnostic Adaptive Testing: Effect on Remedial Instruction as Empirical Validation 总被引:2,自引:0,他引:2
The purpose of this study is to show the usefulness of cognitive diagnoses for remedial instruction. Cognitive diagnoses were done by an adaptive testing system using the rule-space methodology, which was developed by K. K. Tatsuoka and her associates (K. K. Tatsuoka, 1983, 1990; K. K. Tatsuoka & M. M. atsuoka, 1987; M. M. Tatsuoka & K. K. Tatsuoka, 1989). The results of the study strongly indicate that knowing students'knowledge states prior to remediation is very effective and that the rule-space method can effectively diagnose students' knowledge states and can point out ways for remediating their errors quickly with minimum effort. It is also found that the design of instructional units for remediation can be effectively guided by the rule-space model, because the determination of all possible knowledge states in a domain of interest, given an incidence matrix, is based on a partially ordered tree structure of knowledge states, which is equivalent to item-score patterns determined logically from the incidence matrix. 相似文献
13.
项目反应理论(Item Response Theory,IRT)是在克服经典测验理论局限性的基础上发展起来的,在单维性、局部独立性和单调性的前提假设下,更具有优越性。它以潜在特质理论为基础,采用项目特征曲线假设对其进行建模,产生了三个最基本的模型:正态肩型曲线模型、拉希模型和逻辑斯蒂模型。指出了IRT在实际应用中取得实质性进展的两个方面:计算机自适应测验和认知诊断。 相似文献
14.
The purpose of this study was to compare the effects of four item selection rules—(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)—with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage. 相似文献
15.
It is observed that many sorts of difficulties may preclude the uneventful construction of tests by a computerized algorithm, such as those currently in favor in Computerized Adaptive Testing (CAT). In this essay we discuss a number of these problems, as well as some possible avenues of solution. We conclude with the development of the "testlet," a bundle of items that can be arranged either hierarchically or linearly, thus maintaining the efficiency of an adaptive test while keeping the quality control of test construction that is possible currently only with careful expert scrutiny. Performance on the separate testlets is aggregated to yield ability estimates. 相似文献
16.
How has Item Response Theory helped solve problems in the development and use of computer-adaptive tests? Do we need to balance item content with computer-adaptive tests? Could we use IRT to evaluate unusual responses to computer-delivered tests? 相似文献
17.
The Relationship Between Item Exposure and Test Overlap in Computerized Adaptive Testing 总被引:1,自引:0,他引:1
Shu-Ying Chen Robert D. Ankenmann Judith A. Spray 《Journal of Educational Measurement》2003,40(2):129-145
The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (rmax ). Therefore, item exposure control methods which implement a specification of rmax (e.g., Sympson & Hetter, 1985) provide the most direct control at both the item and test levels. 相似文献
18.
《教育实用测度》2013,26(3):241-261
This simulation study compared two procedures to enable an adaptive test to select items in correspondence with a content blueprint. Trait level estimates obtained from testlet-based and constrained adaptive tests administered to 10,000 simulated examinees under two trait distributions and three item pool sizes were compared to the trait level estimates obtained from traditional adaptive tests in terms of mean absolute error, bias, and information. Results indicate that using constrained adaptive testing requires an increase of 5% to 11% in test length over the traditional adaptive test to reach the same error level and, using testlets requires an increase of 43% to 104% in test length over the traditional adaptive test. Given these results, the use of constrained computerized adaptive testing is recommended for situations in which an adaptive test must adhere to particular content specifications. 相似文献
19.
Simulations of computerized adaptive tests (CATs) were used to evaluate results yielded by four commonly used ability estimation methods: maximum likelihood estimation (MLE) and three Bayesian approaches—Owen's method, expected a posteriori (EAP), and maximum a posteriori. In line with the theoretical nature of the ability estimates and previous empirical research, the results showed clear distinctions between MLE and the Bayesian methods, with MLE yielding lower bias, higher standard errors, higher root mean square errors, lower fidelity, and lower administrative efficiency. Standard errors for MLE based on test information underestimated actual standard errors, whereas standard errors for the Bayesian methods based on posterior distribution standard deviations accurately estimated actual standard errors. Among the Bayesian methods, Owen's provided the worst overall results, and EAP provided the best. Using a variable starting rule in which examinees were initially classified into three broad/ability groups greatly reduced the bias for the Bayesian methods, but had little effect on the results for MLE. On the basis of these results, guidelines are offered for selecting appropriate CAT ability estimation methods in different decision contexts. 相似文献
20.
In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings. 相似文献