共查询到20条相似文献,搜索用时 0 毫秒
1.
Steffi Pohl 《Journal of Educational Measurement》2013,50(4):447-468
This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large‐scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for testing in paper and pencil mode, lMST may represent an alternative to conventional testing (CT) in assessments for which other adaptive testing designs are not applicable. In this article the performance of lMST is compared to CT in terms of test targeting as well as bias and efficiency of ability and change estimates. Using a simulation study, the effect of the stability of ability across waves, the difficulty level of the different test forms, and the number of link items between the test forms were investigated. 相似文献
2.
3.
Making Diagnostic Inferences About Cognitive Attributes Using the Rule-Space Model and Attribute Hierarchy Method 总被引:2,自引:0,他引:2
Mark J. Gierl 《Journal of Educational Measurement》2007,44(4):325-340
The purpose of this paper is to describe the logic and identify key assumptions associated with making cognitive inferences using two attribute-based psychometric methods. The first method is Kikumi Tatsuoka's rule-space model. This model provides a strong point of reference for studying the nature of diagnostic inferences because it is important in the evolution of skills diagnostic testing and it is well documented. The second method is a new procedure called the attribute hierarchy method that was developed from the rule-space approach. Although the attribute hierarchy method shares many commonalities with rule space, it represents an extension by including an attribute hierarchy that serves as an explicit cognitive model of task performance designed to link psychometric practices with contemporary cognitive theories. In this paper, we describe and compare these two attribute-based psychometric methods and identify new directions for research and practice in skills diagnostic testing. 相似文献
4.
This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback‐Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson‐Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method. 相似文献
5.
6.
Wim J. van der Linden Krista Breithaupt Siang Chee Chuah Yanwei Zhang 《Journal of Educational Measurement》2007,44(2):117-130
A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed between subtests and test takers and detecting differential speededness. An empirical data set for a multistage test in the computerized CPA Exam was used to demonstrate the procedures. Although the more difficult subtests appeared to have items that were more time intensive than the easier subtests, an analysis of the residual response times did not reveal any significant differential speededness because the time limit appeared to be appropriate. In a separate analysis, within each of the subtests, we found minor but consistent patterns of residual times that are believed to be due to a warm-up effect, that is, use of more time on the initial items than they actually need. 相似文献
7.
Mary Roduta Roberts Cecilia B. Alves Man-Wai Chu Margaret Thompson Louise M. Bahry Andrea Gotzmann 《教育实用测度》2013,26(3):173-195
The purpose of this study was to evaluate the adequacy of three cognitive models, one developed by content experts and two generated from student verbal reports for explaining examinee performance on a grade 3 diagnostic mathematics test. For this study, the items were developed to directly measure the attributes in the cognitive model. The performance of each cognitive model was evaluated by examining its fit to different data samples: verbal report, total, high-, moderate-, and low ability using the Hierarchy Consistency Index (Cui & Leighton, 2009), a model-data fit index. This study utilized cognitive diagnostic assessments developed under the framework of construct-centered test design and analyzed using the Attribute Hierarchy Method (Gierl, Wang, & Zhou, 2008; Leighton, Gierl, & Hunka, 2004). Both the expert-based and the student-based cognitive models provided excellent fit to the verbal report and high ability samples, but moderate to poor fit to the total, moderate and low ability samples. Implications for cognitive model development for cognitive diagnostic assessment are discussed. 相似文献
8.
Computerized Cognitive Diagnostic Adaptive Testing: Effect on Remedial Instruction as Empirical Validation 总被引:2,自引:0,他引:2
The purpose of this study is to show the usefulness of cognitive diagnoses for remedial instruction. Cognitive diagnoses were done by an adaptive testing system using the rule-space methodology, which was developed by K. K. Tatsuoka and her associates (K. K. Tatsuoka, 1983, 1990; K. K. Tatsuoka & M. M. atsuoka, 1987; M. M. Tatsuoka & K. K. Tatsuoka, 1989). The results of the study strongly indicate that knowing students'knowledge states prior to remediation is very effective and that the rule-space method can effectively diagnose students' knowledge states and can point out ways for remediating their errors quickly with minimum effort. It is also found that the design of instructional units for remediation can be effectively guided by the rule-space model, because the determination of all possible knowledge states in a domain of interest, given an incidence matrix, is based on a partially ordered tree structure of knowledge states, which is equivalent to item-score patterns determined logically from the incidence matrix. 相似文献
9.
Dual‐Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing 下载免费PDF全文
The development of cognitive diagnostic‐computerized adaptive testing (CD‐CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual‐objective CD‐CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen‐Shannon (JS) divergence, a symmetrized version of the Kullback‐Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. 相似文献
10.
Amy Hendrickson 《Educational Measurement》2007,26(2):44-52
Multistage tests are those in which sets of items are administered adaptively and are scored as a unit. These tests have all of the advantages of adaptive testing, with more efficient and precise measurement across the proficiency scale as well as time savings, without many of the disadvantages of an item-level adaptive test. As a seemingly balanced compromise between linear paper-and-pencil and item-level adaptive tests, development and use of multistage tests is increasing. This module describes multistage tests, including two-stage and testlet-based tests, and discusses the relative advantages and disadvantages of multistage testing as well as considerations and steps in creating such tests. 相似文献
11.
桑爱江 《黑龙江教育学院学报》2011,30(7):145-146
通过对"冠词+名词(形容词)"表示类指的特征进行分类研究,深入探讨冠词在名词词组中表达类别功能的作用。冠词的指类特征与其基本属性有着密切的关系,特别是在"the+形容词"结构中,冠词促使形容词名词化并赋予其类别语义特征。 相似文献
12.
Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait (θ) estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to calibrate items using the incomplete data from MST design. Further complication arises when there are multiple correlated subscales per test, and when items from different subscales need to be calibrated according to their respective score reporting metric. The current calibration-per-subscale method produced biased item parameters, and there is no available method for resolving the challenge. Deriving from the missing data principle, we showed when calibrating all items together the Rubin's ignorability assumption is satisfied such that the traditional single-group calibration is sufficient. When calibrating items per subscale, we proposed a simple modification to the current calibration-per-subscale method that helps reinstate the missing-at-random assumption and therefore corrects for the estimation bias that is otherwise existent. Three mainstream calibration methods are discussed in the context of MST, they are the marginal maximum likelihood estimation, the expectation maximization method, and the fixed parameter calibration. An extensive simulation study is conducted and a real data example from NAEP is analyzed to provide convincing empirical evidence. 相似文献
13.
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising that block may be unrelated to each other, or they may comprise a testlet (Wainer and Kiely, 1987) After the first block of items has been administered, adaptation takes place in the choice of the next block to be administered and subsequent blocks. The uMFS design integrates item exposure control, as well as content balancing and other test development needs, into the design of the CAT, instead of placing those activities in the online implementation. We show that it is possible to implement item exposure control, in a very thorough way, in the psychometric specifications of the item blocks. 相似文献
14.
15.
Robert J. Sternberg 《Educational Psychology Review》2018,30(3):857-884
This article reviews four interrelated approaches to reducing an inequitable gap in cognitive and educational test scores between individuals of a dominant culture and individuals of other cultures or subcultures. These approaches include (a) use of broader measures, (b) performance- and project-based assessments, (c) direct measurement of knowledge and skills relevant to environmental adaptation, and (d) dynamic assessment. It is concluded that when appropriate assessment is done that recognizes students’ diverse cultural and social backgrounds, equity can increase, predictive validity of cognitive and educational tests can increase, and at the same time, racial/ethnic/culture differences can decrease. 相似文献
16.
Hung‐Yu Huang 《Journal of Educational Measurement》2017,54(4):440-480
Cognitive diagnosis models (CDMs) have been developed to evaluate the mastery status of individuals with respect to a set of defined attributes or skills that are measured through testing. When individuals are repeatedly administered a cognitive diagnosis test, a new class of multilevel CDMs is required to assess the changes in their attributes and simultaneously estimate the model parameters from the different measurements. In this study, the most general CDM of the generalized deterministic input, noisy “and” gate (G‐DINA) model was extended to a multilevel higher order CDM by embedding a multilevel structure into higher order latent traits. A series of simulations based on diverse factors was conducted to assess the quality of the parameter estimation. The results demonstrate that the model parameters can be recovered fairly well and attribute mastery can be precisely estimated if the sample size is large and the test is sufficiently long. The range of the location parameters had opposing effects on the recovery of the item and person parameters. Ignoring the multilevel structure in the data by fitting a single‐level G‐DINA model decreased the attribute classification accuracy and the precision of latent trait estimation. The number of measurement occasions had a substantial impact on latent trait estimation. Satisfactory model and person parameter recoveries could be achieved even when assumptions of the measurement invariance of the model parameters over time were violated. A longitudinal basic ability assessment is outlined to demonstrate the application of the new models. 相似文献
17.
本文根据英语诊断性测试的现状,论述开展诊断性测试的重要意义、测试类型、测试形式、编制原则和操作环节,以期有助于对学生的英语学习过程进行及时的干预。 相似文献
18.
认知诊断测试可以反映受试的知识结构和分项技能掌握情况,为受试提供详细的反馈信息.本文简要介绍了认知诊断的原理和步骤,总结了国内外英语测试领域的认知诊断研究已取得的进展,并指出目前该领域尚存的问题远大于已取得的成就,在未来的研究中需要设计严格意义上的认知诊断测试,探索检验Q矩阵效度的多种方法并开展诊断结果促学的实证研究. 相似文献
19.
认知诊断理论是基于项目反应理论的新一代测量理论,在教育测量实践中具有广阔的应用前景。诊断理论的研究主要围绕诊断模型的提出、模型诊断性能的评估、模型诊断结果的报告三个方面展开。认知诊断研究在上述三个方面的进展促进了诊断模型理论建设的深入与应用范围的拓展,但是在模型的外在效度、模型的群体诊断结果、模型的选择与比较、多分项目的诊断模型以及不同诊断测验之间的等值方面仍需进一步研究探索。 相似文献