首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large‐scale computer‐based sequential testing.  Though most designs of CAT and MST exhibit strength and weakness in recent large‐scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range.  相似文献   

2.
基于Internet的自适应测试系统的设计和开发   总被引:1,自引:1,他引:1  
目前Internet的普及为学习提供了一个广泛而且便利的新途径,作者提出了一个基于Internet的网上自适应测试系统的设计方案,并创建了一个自适应测试的环境。本文介绍了该系统的总体设计框架,并介绍了自适应测试的原理及实现思路。  相似文献   

3.
Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT.  相似文献   

4.
APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS   总被引:1,自引:0,他引:1  
Three applications of computerized adaptive testing (CAT) to help solve problems encountered in educational settings are described and discussed. Each of these applications makes use of item response theory to select test questions from an item pool to estimate a student's achievement level and its precision. These estimates may then be used in conjunction with certain testing strategies to facilitate certain educational decisions. The three applications considered are (a) adaptive mastery testing for determining whether or not a student has mastered a particular content area, (b) adaptive grading for assigning grades to students, and (c) adaptive self-referenced testing for estimating change in a student's achievement level. Differences between currently used classroom procedures and these CAT procedures are discussed. For the adaptive mastery testing procedure, evidence from a series of studies comparing conventional and adaptive testing procedures is presented showing that the adaptive procedure results in more accurate mastery classifications than do conventional mastery tests, while using fewer test questions.  相似文献   

5.
计算机自适应性测验的数学模型研究   总被引:1,自引:0,他引:1  
本文讨论了一种教育评价方式———计算机自适应性测验 (CAT) ,分析了常规测验存在的弊端 ,指出CAT因其众多的优点必将取代常规测验。分析了适合于CAT实现的数学模型———逻辑斯蒂模型 ,并利用三维逻辑斯蒂模型从测验算法上实现CAT  相似文献   

6.
A computerized adaptive testing (CAT) algorithm that has the potential to increase the homogeneity of CAT's item-exposure rates without significantly sacrificing the precision of ability estimates was proposed and assessed in the shadow-test ( van der Linden & Reese, 1998 ) CAT context. This CAT algorithm was formed by a combination of maximizing or minimizing varied target functions while assembling shadow tests. There were four target functions to be separately used in the first, second, third, and fourth quarter test of CAT. The elements to be used in the four functions were associated with (a) a random number assigned to each item, (b) the absolute difference between an examinee's current ability estimate and an item difficulty, (c) the absolute difference between an examinee's current ability estimate and an optimum item difficulty, and (d) item information. The results indicated that this combined CAT fully utilized all the items in the pool, reduced the maximum exposure rates, and achieved more homogeneous exposure rates. Moreover, its precision in recovering ability estimates was similar to that of the maximum item-information method. The combined CAT method resulted in the best overall results compared with the other individual CAT item-selection methods. The findings from the combined CAT are encouraging. Future uses are discussed.  相似文献   

7.
This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for students with disabilities that have typically taken alternate assessments based on modified achievement standards (AA-MAS). A simulation study indicated that the abilities of AA-MAS students can be underestimated or overestimated by the mixed-item CAT, depending on students’ location on the underlying ability scale. These findings held across grade levels and test lengths. The mixed-item CAT appeared to function well for non-AA-MAS students.  相似文献   

8.
Computerized adaptive testing in instructional settings   总被引:3,自引:0,他引:3  
Item response theory (IRT) has most often been used in research on computerized adaptive testing (CAT). Depending on the model used, IRT requires between 200 and 1,000 examinees for estimating item parameters. Thus, it is not practical for instructional designers to develop their own CAT based on the IRT model. Frick improved Wald's sequential probability ratio test (SPRT) by combining it with normative expert systems reasoning, referred to as an EXSPRT-based CAT. While previous studies were based on re-enactments from historical test data, the present study is the first to examine how well these adaptive methods function in a real-time testing situation. Results indicate that the EXSPRT-I significantly reduced test lengths and was highly accurate in predicting mastery. EXSPRT is apparently a viable and practical alternative to IRT for assessing mastery of instructional objectives.  相似文献   

9.
Much has been written about the possibilities of computer adaptive testing (CAT), but little practical success has been reported. Most proposals are so complicated and fraught with expected problems that one gets the impression the idea is hopeless. In fact, CAT enabled by the application of probabilistic conjoint measurement works well and the resulting data has shown that most of the expected problems do not necessarily appear. This chapter explains how probabilistic conjoint measurement enables CAT, reviews the results of some applications to medical certification problems in the U.S., and outlines how CAT leads to practical and useful computer assisted instruction.  相似文献   

10.
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long‐term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a‐values while eliminating the need for item pool stratification.  相似文献   

11.
A rapidly expanding arena for item response theory (IRT) is in attitudinal and health‐outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed‐form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q3 Statistic and Pearson's Statistic X2, in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient‐Reported Outcomes Measurement Information System (PROMIS).  相似文献   

12.
When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores.  相似文献   

13.
We investigated students' metacognitive experiences with regard to feelings of difficulty (FD), feelings of satisfaction (FS), and estimate of effort (EE), employing either computerized adaptive testing (CAT) or computerized fixed item testing (FIT). In an experimental approach, 174 students in grades 10 to 13 were tested either with a CAT or a FIT version of a matrices test. Data revealed that metacognitive experiences were not related to the resulting test scores for CAT: test takers who took the matrices test in an adaptive mode were paradoxically more satisfied with their performance the worse they had performed in terms of the resulting ability parameter. They also rated the test as easier the lower they had performed, but their estimates of effort were higher the better they had performed. For test takers who took the FIT version, completely different results were revealed. In line with previous results, test takers were supposed to base these experiences on the subjectively estimated percentage of items solved. This moderated mediation hypothesis was in parts confirmed, as the relation between the percentage of items solved and FD, FS, and EE was revealed to be mediated by the estimated percentage of items solved. Results are discussed with reference to feedback acceptance, errant self-estimations, and test fairness with regard to a possible false regulation of effort in lower ability groups when using CAT.  相似文献   

14.
计算机自适应测验中Rasch模型稳健性的模拟研究   总被引:1,自引:0,他引:1  
本研究采用模拟数据的方法,在计算机自适应测验(Computer Adaptive Test,简称CAT)中分别采用Rasch及Birnbaum两种模型估计能力,通过比较两者的误差均方根(Root Mean Square Error,简称RMSE)、平均差异(Average Deviation,简称AD)及能力相关,对Rasch模型在CAT中的稳健性进行了研究。结果发现Rasch模型在区分度不等的条件下仍然能较准确地估计被试的能力水平,具有很强的稳健性。  相似文献   

15.
Large-scale assessments often use a computer adaptive test (CAT) for selection of items and for scoring respondents. Such tests often assume a parametric form for the relationship between item responses and the underlying construct. Although semi- and nonparametric response functions could be used, there is scant research on their performance in a CAT. In this work, we compare parametric response functions versus those estimated using kernel smoothing and a logistic function of a monotonic polynomial. Monotonic polynomial items can be used with traditional CAT item selection algorithms that use analytical derivatives. We compared these approaches in CAT simulations with a variety of item selection algorithms. Our simulations also varied the features of the calibration and item pool: sample size, the presence of missing data, and the percentage of nonstandard items. In general, the results support the use of semi- and nonparametric item response functions in a CAT.  相似文献   

16.
With the proliferation of computers in test delivery today, adaptive testing has become quite popular, especially when examinees must be classified into two categories (pass/fail, master/nonmaster). Several well‐established organisations have provided standards and guidelines for the design and evaluation of educational and psychological testing. The purpose of this paper was not to repeat the guidelines and standards that exist in the literature but to identify and discuss the main evaluation parameters for a computer‐adaptive test (CAT). A number of parameters should be taken into account when evaluating CAT. Key parameters include utility, validity, reliability, satisfaction, usability, reporting, administration, security, and thoseassociated with adaptivity, item pool, and psychometric theory. These parameters are presented and discussed below and form a proposed evaluation model, Evaluation Model of Computer‐Adaptive Testing.  相似文献   

17.
This paper examines listening comprehension skills of visually impaired students (VIS) using computerised adaptive testing (CAT) and reader-assisted paper-pencil testing (raPPT) and student views about them. Explanatory mixed method design was used in this study. Sample is comprised of 51 VIS, in 7th and 8th grades. 9 of these students were interviewed for determining student views about tests. Results indicated that scores obtained from CAT are significantly lower than scores obtained from raPPT. Additionally, a positive and high correlation was found between scores of CAT and raPPT. This result suggests that similar ability estimations were made by CAT and raPPT. Another finding is CAT made more reliable predictions, and was completed in shorter duration using fewer items. In qualitative part, student views were gathered through interviews and content analysis revealed three themes as technical features, test features, and psychological effects. In general, students reported positive views about CAT. VIS prefer CAT due to its listening/control options, shorter test durations, clarity of reading, and fairness of test, elimination of dependency to reader. Study provides implications for test developers and test-users to consider CAT as a multi-accommodation for VIS through its advantages.  相似文献   

18.
校阅(review)对CAT中受测者心理影响的研究有利于CAT的推广应用。J.Oleat等研究者在CAT(computerizedadaptivetest,计算机自适应性测验)与FIT(fixed-itemtest,固定项目测验)中,将校阅(review)对受测者行为表现与状态焦虑水平的影响作了比较研究。实验结果表明,在允许校阅的环境中,受测者正确反应与被估计的能力水平显著增加,并且状态焦虑水平下降。本文对J.Oleat等研究者的主要研究成果进行了详细的评述,并进一步探讨了在CAT中增设校阅(review)环境的重要性和可能性。  相似文献   

19.
Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of psychometric characteristics and content specifications. This study introduces the Honeycomb Pool Assembly (HPA) framework, an innovative solution for the construction of multiple parallel item pools for CAT that maximizes item utilization in the item bank. The HPA framework comprises two stages—cell assembly and pool assembly—and uses a mixed integer programming modeling approach. An empirical study demonstrated HPA's effectiveness in creating a large number of parallel pools using a real-world high-stakes CAT assessment item bank. The HPA framework offers several advantages, including (a) simultaneous creation of multiple parallel pools, (b) simplification of item pool maintenance, and (c) flexibility in establishing statistical and operational constraints. Moreover, it can help testing organizations efficiently manage and monitor the health of their item banks. Thus, the HPA framework is expected to be a valuable tool for testing professionals and organizations to address test security challenges and maintain the integrity of high-stakes CAT assessments.  相似文献   

20.
Recent developrnents of person-Jit analysis in computerized adaptive testing (CAT) are discussed. Methods from stutistical process control are presented that have been proposed to classify an item score pattern as fitting or misjitting the underlying item response theory model in CAT. Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used, Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that dizerent types of misfit can be distinguished. Further applications using statistical process control methods to detect misfitting item score patterns are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号