期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS 总被引：1，自引：0，他引：1

DAVID J. WEISS G. GAGE KINGSBURY 《Journal of Educational Measurement》1984,21(4):361-375

Three applications of computerized adaptive testing (CAT) to help solve problems encountered in educational settings are described and discussed. Each of these applications makes use of item response theory to select test questions from an item pool to estimate a student's achievement level and its precision. These estimates may then be used in conjunction with certain testing strategies to facilitate certain educational decisions. The three applications considered are (a) adaptive mastery testing for determining whether or not a student has mastered a particular content area, (b) adaptive grading for assigning grades to students, and (c) adaptive self-referenced testing for estimating change in a student's achievement level. Differences between currently used classroom procedures and these CAT procedures are discussed. For the adaptive mastery testing procedure, evidence from a series of studies comparing conventional and adaptive testing procedures is presented showing that the adaptive procedure results in more accurate mastery classifications than do conventional mastery tests, while using fewer test questions. 相似文献

2.

Procedures for Selecting Items for Computerized Adaptive Tests

《教育实用测度》2013,26(4):359-375

Many procedures have been developed for selecting the "best" items for a computerized adaptive test. There is a trend toward the use of adaptive testing in applied settings such as licensure tests, program entrance tests, and educational tests. It is useful to consider procedures for item selection and the special needs of applied testing settings to facilitate test design. The current study reviews several classical approaches and alternative approaches to item selection and discusses their relative merit. This study also describes procedures for constrained computerized adaptive testing (C-CAT) that may be added to classical item selection approaches to allow them to be used for applied testing, while maintaining the high measurement precision and short test length that made adaptive testing attractive to practitioners initially. 相似文献

3.

Computerized Adaptive and Fixed-Item Testing of Music Listening Skill: A Comparison of Efficiency, Precision, and Concurrent Validity

Walter P. Vispoel Tianyou Wang Timothy Bleiler 《Journal of Educational Measurement》1997,34(1):43-63

We evaluated the efficiency, precision, and concurrent validity of results obtained from adaptive and fired-item music listening tests in three studies: (a) a computer simulation study in which each of 2,200 simulees completed a computerized adaptive tonal memory test, a computerized fired-item tonal memory test constructed from items in the adaptive test pool and two standardized group-administered tonal memory tests; (b) a live testing study in which each of 204 examinees took the computerized adaptive test and the standardized tests; and (c) a live testing study in which randomly equivalent groups took either the computerized adaptive test (n = 86) or the computerized fired-item test (n = 86). The adaptive music test required 50% to 93% fewer items to match the reliability and concurrent validity of the fired-item tests, and it yielded higher levels of reliability and concurrent validity than the fired-item tests when test length was held constant. These findings suggest that computerized adaptive tests, which typically have been limited to visually produced items, may also be well suited for measuring skills that require aurally produced items. 相似文献

4.

Quality Control for Scoring Tests Administered in Continuous Mode: An NCME Instructional Module

下载免费PDF全文

Michal Baumer 《Educational Measurement》2017,36(1):58-68

Quality control (QC) in testing is paramount. QC procedures for tests can be divided into two types. The first type, one that has been well researched, is QC for tests administered to large population groups on few administration dates using a small set of test forms (e.g., large‐scale assessment). The second type is QC for tests, usually computerized, that are administered to small population groups on many administration dates using a wide array of test forms (CMT—continuous mode tests). Since the world of testing is headed in this direction, developing QC for CMT is crucial. In the current ITEMS module we discuss errors that might occur at the different stages of the CMT process, as well as the recommended QC procedure to reduce the incidence of each error. Illustration from a recent study is provided, and a computerized system that applies these procedures is presented. Instructions on how to develop one's own QC procedure are also included. 相似文献

5.

An Empirical Study of Computerized Adaptive Test Administration Conditions

Mary E. Lunz Betty A. Bergstrom 《Journal of Educational Measurement》1994,31(3):251-263

This empirical study was designed to determine the impact of computerized adap- tive test (CAT) administration formats on student performance. Students in medical technology programs took a paper-and-pencil and an individualized, computerized adaptive test. Students were randomly assigned to adaptive test administration for- mats to ascertain the effect on student performance of altering: (a) the difficulty of the first item, (b) the targeted level of test difficulty, (c) minimum test length, and (d) the opportunity to control the test. Computerized adaptive test data were analyzed with ANCO VA. The paper-and.pencil test was used as a covariate to equalize abil- ity variance among cells. The only significant main effect was for opportunity to control the test. There were no significant interactions among test administration formats. This study provides evidence concerning adjusting traditional computer- ized adaptive testing to more familiar testing modalities. 相似文献

6.

计算机化线性测验与自适应测验的等效性研究

李心钰陆宏《现代教育技术》2022,(1):85-93

基于计算机的测验已逐渐普及,但不同的计算机测验形式在测量相同任务时可能会产生测验结果的偏差,从而导致教育测量与评价结果的不公平性。文章基于项目反应理论,探讨了计算机化线性测验与计算机自适应测验在测验效率、测验结果的统计学特征及其对考生个体心理特质的影响是否等效等问题,并以师范生"现代教育技术"课程为例开展了实证研究,结果显示:两种测验中考生的分数具有可比性,计算机自适应测验具有更高的测验效率与测验信度,但有无即时反馈对考生测验焦虑的影响较大;而计算机化线性测验具有更合理的内容效度,有无即时反馈对考生测验焦虑的影响较小。文章的研究不仅对教学评价中测验形式的选择是否公平合理进行了科学分析,而且为施测者根据测验场景有针对性地选择测验形式提供了理论参考。相似文献

7.

Constructing a Computerized Adaptive Test for University Applicants With Disabilities

《教育实用测度》2013,26(4):381-405

In recent years, there has been a large increase in the number of university applicants requesting special accommodations for university entrance exams. The Israeli National Institute for Testing and Evaluation (NITE) administers a Psychometric Entrance Test (comparable to the Scholastic Assessment Test in the United States) to assist universities in Israel in selecting undergraduates. Because universities in Israel do not permit flagging of candidates receiving special testing accommodations, such scores are treated as identical to scores attained under regular testing conditions. The increase in the number of students receiving testing accommodations and the prohibition of flagging have brought into focus certain psychometric issues pertaining to the fairness of testing students with disabilities and the comparability of special and standard testing conditions. To address these issues, NITE has developed a computerized adaptive psychometric test for administration to examinees with disabilities. This article discusses the process of developing the computerized test and ensuring its comparability to the paper-and-pencil test. This article also presents data on the operational computerized test. 相似文献

8.

Differences Between Self-Adapted and Computerized Adaptive Tests: A Meta-Analysis

Angela K. Pitkin Walter P. Vispoel 《Journal of Educational Measurement》2001,38(3):235-247

Self-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error, the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. We speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. 相似文献

9.

计算机自适应序列考试概述 总被引：1，自引：2，他引：1

关丹丹刘庆思《中国考试》2011,(1)

计算机自适应序列考试(CAST)是一种考试实施程序,也是一种考试设计的方法学。本文以一个1-3-3模式的三阶段CAST为例,介绍了CAST的基本框架、组卷策略与步骤,以及CAST的实施。CAST既能像计算机自适应考试(CAT)一样高效,同时又能满足考试的内容要求,为实现高品质的计算化考试带来了新的哲学理念。相似文献

10.

A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests

《教育实用测度》2013,26(3):241-261

This simulation study compared two procedures to enable an adaptive test to select items in correspondence with a content blueprint. Trait level estimates obtained from testlet-based and constrained adaptive tests administered to 10,000 simulated examinees under two trait distributions and three item pool sizes were compared to the trait level estimates obtained from traditional adaptive tests in terms of mean absolute error, bias, and information. Results indicate that using constrained adaptive testing requires an increase of 5% to 11% in test length over the traditional adaptive test to reach the same error level and, using testlets requires an increase of 43% to 104% in test length over the traditional adaptive test. Given these results, the use of constrained computerized adaptive testing is recommended for situations in which an adaptive test must adhere to particular content specifications. 相似文献

11.

计算机自适应考试系统研究

吕岚《晋城职业技术学院学报》2013,6(4):56-59

本文结合专家经验确定法和项目反应理论,设计出一种简明、实用的计算机自适应考试系统的试题难度确定方法,同时重点分析计算机自适应考试系统的测试起点、终点选择,选题策略和能力值估计方法。最后列举了一个自适应测试的步骤实例。本系统能够根据不同能力被试者随机选择试题项目,减少了测试长度,与传统在线考试系统相比提高了考试效率。相似文献

12.

Threats to Score Comparability With Applications to Performance Assessments and Computerized Adaptive Tests

《Educational Assessment》2013,18(2):73-96

This article develops a conceptual framework that addresses score comparability. The intent of the framework is to help identify and organize threats to comparability in a particular assessment situation. Aspects of the testing situations that might threaten score comparability are delineated, procedures for evaluating the degree of score comparability are described, and suggestions are made about how to minimize the effects of potential threats. The situations considered are restricted to those in which test developers intend to (a) be able to use scores on 2 or more tests interchangeably, (b) collect data that allow for the conversion of scores on each of the tests to a common scale, and (c) use the scores to make decisions about individuals. Comparability of scores on alternate forms of performance assessments, adaptive and paper-and-pencil tests, and alternate pools used for computerized adaptive tests are considered within the framework. Aspects of these testing situations that might threaten score comparability and procedures for evaluating the degree of score comparability are described. Suggestions are made about how to minimize the effects of potential threats to comparability. 相似文献

13.

Evaluating Content Alignment in Computerized Adaptive Testing

下载免费PDF全文

Steven L. Wise G. Gage Kingsbury Norman L. Webb 《Educational Measurement》2015,34(4):41-48

The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process. 相似文献

14.

Flawed Items in Computerized Adaptive Testing

Maria T. Potenza Martha L. Stocking 《Journal of Educational Measurement》1997,34(1):79-96

Educational Testing Service A multiple-choice test item is identified as flawed if it has no single best answer. In spite of extensive quality control procedures, the administration of flawed items to test takers is inevitable. A limited set of common strategies for dealing with flawed items in conventional testing, grounded in the principle of fairness to examinees, is reexamined in the context of adaptive testing. An additional strategy, available for adaptive testing, of retesting from a pool cleansed of flawed items, is compared to the existing strategies. Retesting was found to be no practical improvement over current strategies. 相似文献

15.

Outlier Detection in High-Stakes Certification Testing

Rob R. Meijer 《Journal of Educational Measurement》2002,39(3):219-233

Recent developrnents of person-Jit analysis in computerized adaptive testing (CAT) are discussed. Methods from stutistical process control are presented that have been proposed to classify an item score pattern as fitting or misjitting the underlying item response theory model in CAT. Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used, Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that dizerent types of misfit can be distinguished. Further applications using statistical process control methods to detect misfitting item score patterns are discussed. 相似文献

16.

On the Issue of Item Selection in Computerized Adaptive Testing With Response Times

Bernard P. Veldkamp 《Journal of Educational Measurement》2016,53(2):212-228

Many standardized tests are now administered via computer rather than paper‐and‐pencil format. The computer‐based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second advantage is the ability to record not only the test taker's response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Combining these two advantages, various methods were explored for utilizing response time data in selecting appropriate items for an individual test taker. Four strategies for incorporating response time data were evaluated, and the precision of the final test‐taker score was assessed by comparing it to a benchmark value that did not take response time information into account. While differences in measurement precision and testing times were expected, results showed that the strategies did not differ much with respect to measurement precision but that there were differences with regard to the total testing time. 相似文献

17.

计算机自适应考试系统的设计

林邦国《湖北广播电视大学学报》2009,29(4):151-152

计算机自适应考试是项目反应理论和计算机技术想结合的产物,本文依据项目反应理论,对自适应考试系统的中的能力估计、选题策略和终止规则等关键模块的设计进行了较为深入的探讨,并提出了基于J2EE系统实现的模型框架。相似文献

18.

Detection of Test Collusion via Kullback–Leibler Divergence

Dmitry I. Belov 《Journal of Educational Measurement》2013,50(2):141-163

The development of statistical methods for detecting test collusion is a new research direction in the area of test security. Test collusion may be described as large‐scale sharing of test materials, including answers to test items. Current methods of detecting test collusion are based on statistics also used in answer‐copying detection. Therefore, in computerized adaptive testing (CAT) these methods lose power because the actual test varies across examinees. This article addresses that problem by introducing a new approach that works in two stages: in Stage 1, test centers with an unusual distribution of a person‐fit statistic are identified via Kullback–Leibler divergence; in Stage 2, examinees from identified test centers are analyzed further using the person‐fit statistic, where the critical value is computed without data from the identified test centers. The approach is extremely flexible. One can employ any existing person‐fit statistic. The approach can be applied to all major testing programs: paper‐and‐pencil testing (P&P), computer‐based testing (CBT), multiple‐stage testing (MST), and CAT. Also, the definition of test center is not limited by the geographic location (room, class, college) and can be extended to support various relations between examinees (from the same undergraduate college, from the same test‐prep center, from the same group at a social network). The suggested approach was found to be effective in CAT for detecting groups of examinees with item pre‐knowledge, meaning those with access (possibly unknown to us) to one or more subsets of items prior to the exam. 相似文献

19.

Longitudinal Multistage Testing

Steffi Pohl 《Journal of Educational Measurement》2013,50(4):447-468

This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large‐scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for testing in paper and pencil mode, lMST may represent an alternative to conventional testing (CT) in assessments for which other adaptive testing designs are not applicable. In this article the performance of lMST is compared to CT in terms of test targeting as well as bias and efficiency of ability and change estimates. Using a simulation study, the effect of the stability of ability across waves, the difficulty level of the different test forms, and the number of link items between the test forms were investigated. 相似文献

20.

Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes

Rae Yeong Kim Yun Joo Yoo 《Journal of Educational Measurement》2023,60(1):126-147

In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages. 相似文献