期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Bookmark Standard-Setting Method: A Literature Review 总被引：1，自引：0，他引：1

Ana Karantonis Stephen G. Sireci 《Educational Measurement》2006,25(1):4-12

The Bookmark method for setting standards on educational tests is currently one of the most popular standard-setting methods. However, research to support the method is scarce. In this report, we review the published and unpublished literature on this method as well as some seminal work in the area of evaluating standard-setting studies. Our review highlights both strengths and limitations of the method. Strengths include its wide acceptance and panelist confidence in the method. Limitations include a potential bias to produce lower-than-intended standards and problems in selecting the most appropriate response probability value for ordering the items presented to panelists. It is clear that more research on this method is needed to support its wide use. Several areas for future research to better understand the validity of the Bookmark method for setting standards on educational tests are presented. 相似文献

2.

标准设定与等级划分

向冠春《成人教育》2013,33(1):14-20

标准设定在教育测量领域是一个相当重要的议题,它涉及面十分广泛、备受人们争议,解决起来非常棘手。为了解决此难题,国外涌现出大量标准设定方面的理论研究和实践应用。我们在标准设定方面的研究还比较欠缺,文章归纳各种方法进行标准设定的步骤、介绍一些经典的标准设定方法以及剑桥评价在进行等级划分时的运用,以期对我们的标准设定实践有所助益,增加考试的信度。相似文献

3.

A Comparative Study of Standard-Setting Methods

《教育实用测度》2013,26(2):121-141

The borderline-group method and the contrasting-groups method were compared with Nedelsky's method at four schools, with Angoff's method at another four schools, and with each other at all eight schools, using tests of basic skills in reading and mathematics. The borderline-group and contrastinggroups methods produced similar results when approximately equal numbers of students were classified as masters and nonmasters. The contrasting-groups passing score was lower than the borderline-group passing score when masters greatly outnumbered nonmasters and higher when nonmasters outnumbered masters. Results involving the Nedelsky and Angoff methods were not consistent across schools. Passing scores tended to be higher at schools where students were more able. 相似文献

4.

Determining Sufficient Measurement Opportunities When Using Multiple Cut Scores

Rebecca L. Norman Chad W. Buckendahl 《Educational Measurement》2008,27(1):37-46

Many educational testing programs report examinee performance at more than two levels of proficiency. Whether these assessments have the capacity to support these multiple inferences, though, is a topic that has not been widely discussed. This study proposes a method for evaluating the minimum number of measurement opportunities for reporting students' performance at multiple achievement levels and describes an application of the method for reading and mathematics assessments that are used by some school districts in Nebraska. Analyses were based on judgments collected from 110 teachers about characteristics of items and tasks from multiple assessments in reading and mathematics at grades 4 and 8, and in high school. Results suggested that there were generally enough items on the mathematics assessments to classify students into two or three performance levels, but rarely enough to make the four classifications that the state reported. Items on the reading assessments were generally distributed across the proficiency levels and tended to allow reporting for all four classification levels. These findings have implications for both practitioners and policymakers in how scores are interpreted. 相似文献

5.

Rejoinder: Evaluating Standard Setting Methods Using Error Models Proposed by Schulz

Mark D. Reckase 《Educational Measurement》2006,25(3):14-17

相似文献

6.

Commentary: A Response to Reckase's Conceptual Framework and Examples for Evaluating Standard Setting Methods 总被引：1，自引：0，他引：1

E. Matthew Schulz 《Educational Measurement》2006,25(3):4-13

A look at real data shows that Reckase's psychometric theory for standard setting is not applicable to bookmark and that his simulations cannot explain actual differences between methods. It is suggested that exclusively test-centered, criterion-referenced approaches are too idealized and that a psychophysics paradigm and a theory of group behavior could be more useful in thinking about the standard setting process. In this view, item mapping methods such as bookmark are reasonable adaptations to fundamental limitations in human judgments of item difficulty. They make item ratings unnecessary and have unique potential for integrating external validity data and student performance data more fully into the standard setting process. 相似文献

7.

Resitting or compensating a failed examination: does it affect subsequent results?

Ivo Arnold 《Assessment & Evaluation in Higher Education》2017,42(7):1103-1117

Institutions of higher education commonly employ a conjunctive standard setting strategy, which requires students to resit failed examinations until they pass all tests. An alternative strategy allows students to compensate a failing grade with other test results. This paper uses regression discontinuity design to compare the effect of first-year resits and compensations on second-year study results. We select students with a similar level of knowledge in a first-year introductory course and estimate the treatment effect of a resit on the result for a second-year intermediate course in the same subject. We find that the treatment effect is positive, but insignificantly different from zero. Additional results show that students’ overall second-year performance is insignificantly related to the number of compensated failing grades in their first year. The number of attempts that students need to complete their first year does not have a significant positive effect on second-year performance. We conclude that the evidence for a positive effect of resits on learning is weak at best. 相似文献

8.

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

Peter Baldwin Melissa J. Margolis Brian E. Clauser Janet Mee Marcia Winward 《Educational Measurement》2020,39(1):37-44

Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the resulting judgments. This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores. The results showed substantially different cut scores for the two conditions; this calls into question whether content experts can produce the type of internally consistent judgments that are required using the bookmark procedure. 相似文献

9.

A Conceptual Framework for a Psychometric Theory for Standard Setting with Examples of Its Use for Evaluating the Functioning of Two Standard Setting Methods

Mark D. Reckase 《Educational Measurement》2006,25(2):4-18

A conceptual framework is proposed for a psychometric theory of standard setting. The framework suggests that participants in a standard setting process (panelists) develop an internal, intended standard as a result of training and the participant's background. The goal of a standard setting process is to convert panelists' intended standards to points on a test's score scale. Psychometrics is involved in this process because the points on the score scale are estimated from ratings provided by participants. The conceptual framework is used to derive three criteria for evaluating standard setting processes. The use of these criteria is demonstrated by applying them to variations of bookmark and modified Angoff standard setting methods. 相似文献

10.

How accurate are examiners’ holistic judgements of script quality?

Tim Gill Tom Bramley 《Assessment in Education: Principles, Policy & Practice》2013,20(3):308-324

相似文献

11.

Melissa J. Margolis Brian E. Clauser 《Educational Measurement》2014,33(1):15-22

This research evaluated the impact of a common modification to Angoff standard‐setting exercises: the provision of examinee performance data. Data from 18 independent standard‐setting panels across three different medical licensing examinations were examined to investigate whether and how the provision of performance information impacted judgments and the resulting cut scores. Results varied by panel but in general indicated that both the variability among the panelists and the resulting cut scores were affected by the data. After the review of performance data, panelist variability generally decreased. In addition, for all panels and examinations pre‐ and post‐data cut scores were significantly different. Investigation of the practical significance of the findings indicated that nontrivial fail rate changes were associated with the cut score changes for a majority of standard‐setting exercises. This study is the first to provide a large‐scale, systematic evaluation of the impact of a common standard setting practice, and the results can provide practitioners with insight into how the practice influences panelist variability and resulting cut scores. 相似文献

12.

Daniel Lewis Robert Cook 《Educational Measurement》2020,39(1):8-21

相似文献

13.

标准设定：步骤、方法与评价指标 总被引：1，自引：0，他引：1

李珍辛涛陈平《考试研究》2010,(2):83-95

标准设定（standard setting）是划分标准的过程,指在测验分数分布中划分出两类或两类以上的分界分数。通过标准设定,考生可以被分为“通过”和“未通过”,或者是被分为更多的有序表现类别。标准设定是标准参照测验的重要组成部分,也可为测验决策者提供关于测验效度的依据,是目前测量领域一个颇受关注的研究问题。本文首先回顾了标准设定的源起和发展历程,然后详细地介绍了标准设定的基本步骤和几种主要的标准设定方法,评估标准设定过程的指标,最后简单论述了在国内各类考试中应用标准设定的必要性。相似文献

14.

Brian C. Leventhal Irina Grabovsky 《Educational Measurement》2020,39(1):30-36

Standard setting is arguably one of the most subjective techniques in test development and psychometrics. The decisions when scores are compared to standards, however, are arguably the most consequential outcomes of testing. Providing licensure to practice in a profession has high stake consequences for the public. Denying graduation or forcing remediation has high-impact consequences for students. Unfortunately, tests that classify individuals are subjected to false positive and false negative misclassifications. When determining a standard, standard setting panelists implicitly consider the negative consequences of the decisions made from test use. We propose the conscious weight method and subconscious weight method to bring more objectivity to the standard setting process. To do this, these methods quantify the relative harm of the negative consequences of false positive and false negative misclassification. 相似文献

15.

Consistency of Standard Setting in an Augmented State Testing System

Robert W. Lissitz Hua Wei 《Educational Measurement》2008,27(2):46-55

In this article we address the issue of consistency in standard setting in the context of an augmented state testing program. Information gained from the external NRT scores is used to help make an informed decision on the determination of cut scores on the state test. The consistency of cut scores on the CRT across grades is maintained by forcing a consistency model based on the NRT scores and translating that information back to the CRT scores. The inconsistency of standards and the application of this model are illustrated using data from the Maryland MSA large state testing program involving cut points for basic, proficient and advanced in mathematics and reading across years and across grades. The model is discussed in some detail and shown to be a promising approach, although not without assumptions that must be made and issues that might be raised. 相似文献

16.

2006 Presidential Address: Errors and Omissions: Some Illustrations From Unpublished Research

James C. Impara 《Educational Measurement》2007,26(1):3-8

相似文献

17.

The Dependence of Growth‐Model Results on Proficiency Cut Scores

Andrew D. Ho Daniel M. Lewis Jason L. MacGregor Farris 《Educational Measurement》2009,28(4):15-26

States participating in the Growth Model Pilot Program reference individual student growth against “proficiency” cut scores that conform with the original No Child Left Behind Act (NCLB). Although achievement results from conventional NCLB models are also cut‐score dependent, the functional relationships between cut‐score location and growth results are more complex and are not currently well described. We apply cut‐score scenarios to longitudinal data to demonstrate the dependence of state‐ and school‐level growth results on cut‐score choice. This dependence is examined along three dimensions: 1) rigor, as states set cut scores largely at their discretion, 2) across‐grade articulation, as the rigor of proficiency standards may vary across grades, and 3) the time horizon chosen for growth to proficiency. Results show that the selection of plausible alternative cut scores within a growth model can change the percentage of students “on track to proficiency” by more than 20 percentage points and reverse accountability decisions for more than 40% of schools. We contribute a framework for predicting these dependencies, and we argue that the cut‐score dependence of large‐scale growth statistics must be made transparent, particularly for comparisons of growth results across states. 相似文献

18.

Integrated, Comprehensive Alignment as a Foundation for Measuring Student Progress

Joseph Martineau Pamela Paek John Keene Thomas Hirsch 《Educational Measurement》2007,26(1):28-35

相似文献

19.

Gregory J. Cizek Michael B. Bunch Heather Koons 《Educational Measurement》2004,23(4):31-31

This module describes some common standard-setting procedures used to derive performance levels for achievement tests in education, licensure, and certification. Upon completing the module, readers will be able to: describe what standard setting is; understand why standard setting is necessary; recognize some of the purposes of standard setting; calculate cut scores using various methods; and identify elements to be considered when evaluating standard-setting procedures. A self-test and annotated bibliography are provided at the end of the module. Teaching aids to accompany the module are available through NCME. 相似文献

20.

Boaz Shulruf Phillippa Poole Philip Jones Tim Wilkinson 《Assessment & Evaluation in Higher Education》2015,40(3):420-438

A new probability-based standard setting technique, the Objective Borderline Method (OBM), was introduced recently. This was based on a mathematical model of how test scores relate to student ability. The present study refined the model and tested it using 2500 simulated data-sets. The OBM was feasible to use. On average, the OBM performed well with specificity .88, sensitivity .51, false positive rate 3.4% and false negative rate 26%. These indices were insensitive to the borderline score range. This probability-based standard setting may be a useful addition to the range of standard setting methods available. 相似文献