首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Some writers in the measurement literature have been skeptical of the meaningfulness of achievement standards and described the standard-setting process as blatantly arbitrary. We argue that standard setting is more appropriately conceived of as a measurement process similar to student assessment. The construct being measured is the panelists' representation of student performance at the threshold of an achievement level. In the first section of this paper, we argue that standard setting is an example of stimulus-centered measurement. In the second section, we elaborate on this idea by comparing some popular standard-setting methods to the stimulus-centered scaling methods known as psychophysical scaling. In the third section, we use the lens of standard setting as a measurement process to take a fresh look at the two criticisms of standard setting: the role of judgment and the variability of results. In the fourth section, we offer a vision of standard-setting research and practice as grounded in the theory and practice of educational measurement .  相似文献   

2.
Who should make judgments about test standards? Who is an expert? How many judges should be used in a standard-setting study? What is the relationship between the number of judges and the standard error of the test?  相似文献   

3.
What factors influence judges when they set standards? How do judges, test questions, and the standard-setting process interact? How can we improve intrajudge consistency?  相似文献   

4.
《教育实用测度》2013,26(2):121-141
The borderline-group method and the contrasting-groups method were compared with Nedelsky's method at four schools, with Angoff's method at another four schools, and with each other at all eight schools, using tests of basic skills in reading and mathematics. The borderline-group and contrastinggroups methods produced similar results when approximately equal numbers of students were classified as masters and nonmasters. The contrasting-groups passing score was lower than the borderline-group passing score when masters greatly outnumbered nonmasters and higher when nonmasters outnumbered masters. Results involving the Nedelsky and Angoff methods were not consistent across schools. Passing scores tended to be higher at schools where students were more able.  相似文献   

5.
Is training judges beyond initial orientation required? How can we help judges apply their conceptualization of minimal competence to individual items?  相似文献   

6.
《Educational Assessment》2013,18(3):129-145
Alternate approaches to standard setting cannot be evaluated in terms of their accuracy, because the standard does not exist until we set it. To set a standard is to establish a policy, and policies are evaluated in terms of their appropriateness, reasonableness, and consistency, rather than in terms of accuracy. Of the 2 general approaches to standard setting currently in use. the test-centered methods rely on judgments about test items, whereas the examinee-centered methods rely on judgments about examinees. This article examines criteria for choosing between these 2 approaches to standard setting in terms of empirical criteria and in terms of whether the method is consistent with (a) the model of achievement underlying test design and interpretation and (b) the assessment methods being used.  相似文献   

7.
Is it enough to run the standard-setting panel and to stop? Once a panel has met or a contrasting group experiment has been conducted, do we have a cutoff score? On what basis might a proposed cutoff score be adjusted? Who should make the adjustment decision?  相似文献   

8.
Different standard-setting procedures usually produce different cut points even if each has a rational basis. In 2000, three standard-setting procedures were implemented to set cut scores in each of the 18 grade/content areas comprising Kentucky's state assessment system: the Contrasting Groups, Bookmark, and Jaeger-Mills procedures. Subsequently, participants from each of the three procedures worked together in each grade/content area to synthesize the results. These synthesis participants considered the results of, and examined the materials and information provided by, each of the three separate procedures. In this article the synthesis processes are described and discussed.  相似文献   

9.
《教育实用测度》2013,26(1):59-72
Test makers are struggling with the issue of whether to provide a standard calculator for all participants or allow students to bring in their own calculators. A within-subjects design was used to examine (a) effects of calculator type (own calculator vs. standard calculator) on student performance and (b) differential impacts of calculator type for children from a variety of backgrounds. Fifty Grade 8 students completed a set of National Assessment of Educational Progress problems and a set of timed computation tests with their own calculators and comparable sets of problems with a standard calculator. Performance on mathematical items (i.e., time, accuracy) and the ways in which students used the calculators (e.g., number of keys pressed, calculator difficulties) were not affected by calculator type. No performance advantages associated with calculator type were related to student background characteristics (e.g., socioeconomic status, ethnicity, sex, math ability). However, calculator preference depended on the complexity of the student's own calculator relative to the standard one.  相似文献   

10.
The Bookmark Standard-Setting Method: A Literature Review   总被引:1,自引:0,他引:1  
The Bookmark method for setting standards on educational tests is currently one of the most popular standard-setting methods. However, research to support the method is scarce. In this report, we review the published and unpublished literature on this method as well as some seminal work in the area of evaluating standard-setting studies. Our review highlights both strengths and limitations of the method. Strengths include its wide acceptance and panelist confidence in the method. Limitations include a potential bias to produce lower-than-intended standards and problems in selecting the most appropriate response probability value for ordering the items presented to panelists. It is clear that more research on this method is needed to support its wide use. Several areas for future research to better understand the validity of the Bookmark method for setting standards on educational tests are presented.  相似文献   

11.
One commonly used compromise standard-setting method is the Beuk (1984) method. A key assumption of the Beuk method is that the emphasis given to the pass rate and the percent correct ratings should be proportional to the extent that the panelists agree on their ratings. However, whether the slope of Beuk line reflects the emphasis that panelists believe should be assigned to the pass rate and the percentage correct ratings has not be fully tested. In this article, I evaluate this critical assumption of the Beuk method by asking panelists to assign importance weights to their percentage correct and pass rate judgments. I show that in several cases that the emphasis suggested by the Beuk slope is noticeably different from what one would expect and is inconsistent with importance weight ratings. I also suggest two ways that the importance weights can be used to calculate alternate cut scores, and I show that one of the ways of calculating cut scores using the importance weights leads to larger potential differences in cut score estimates. I suggest that practitioners should consider collecting importance weights when the Beuk method is used for determining cut scores.  相似文献   

12.
This study investigated the comparability of Angoff-based item ratings on a general education test battery made by judges from within-content specialties and across content domains. Judges were from English, mathematics, science, and social studies specialties in teacher education programs in a midwestem state. Cutscores established from the judges'ratings of out-of-content items differed little from the cutscores set using the ratings made by the content specialists. Further, out-of-content ratings by judges were not more influenced by performance data than were the ratings provided by judges rating items within their content specialty. The degree to -which these results generalize to other content specialties needs to be investigated.  相似文献   

13.
A Comparison of Three Variations on a Standard-Setting Method   总被引:1,自引:0,他引:1  
The purpose of this study was to determine whether two variations on the typical Angoff group standard-setting process would produce sufficiently consistent results to recommend their use. Judgments obtained from a group of experts during a meeting were compared with judgments gathered from the same group before and after the meeting. The results indicate that differences between passing scores obtained with the three variations are relatively small, but those gathered before the meeting were less consistent than ratings gathered during and after the meeting. These results imply that judgments gathered after an initial traditional group-process session can provide an efficient alternative mechanism for setting cutting scores using the Angoff method.
This research was supported by The American Board of Internal Medicine, but does not necessarily reflect its opinions or policies.  相似文献   

14.
《Educational Assessment》2013,18(2):129-153
States are increasingly using test scores as part of the requirements for high school graduation or certification. In these circumstances, a battery of tests or, with writing, analytic traits are considered that usually cover different aspects of the state's content standards. Because pass or fail decisions are made affecting students' futures, the validity of standard-setting procedures and strategies is a major concern. Policymakers and legislators must decide which of these 2 standard-setting strategies to use for making pass or fail decisions for students seeking certification or for meeting a high school graduation requirement. The compensatory strategy focuses on total performance, summing scores across all tests in the battery. The conjunctive strategy requires passing performance for each test in the battery. This article reviews and evaluates compensatory and conjunctive standard-setting strategies. The rationales for each type are presented and discussed. Results from a study comparing the compensatory and conjunctive strategies for a state high school certification writing test provide insight into the problem of choosing either strategy. This article concludes with a set of recommendations for those who must decide which type of standard-setting strategy to use.  相似文献   

15.
16.
《Educational Assessment》2013,18(3):197-215
Results of variations on a categorical standard-setting procedure for use with performance assessments with multiple performance standards are reported. Panelists were asked to make independent ordinal categorical assignments of student work. Student work came from the administration of 1 of the 1996 Grade 8 National Assessment of Educational Progress Science Booklets. This study focused on the comparability of cutscores using 2 data analysis strategies, the effects of 2 strategies for making the categorical classifications (sorting vs. direct classification), the efficacy of long and short versions of the classification scale, and the effects of discussion on the basic, proficient, and advanced cutscores. The results indicate that there were minimal differences in cutscores across the 2 data analysis strategies. The direct classification approach was more feasible than the sorting approach, at least with the design as it was implemented in this study. The short version of the classification scale yielded comparable results to the longer version and took less time (about 20%) for the panelists to complete. Discussion produced the expected effect of more agreement among panelists. These results should be interpreted with caution due to the small sample sizes.  相似文献   

17.
我国法律对学位授予单位设定学位授予标准的权限规定不明,导致学位纠纷案件频发。理论界对高校能否增加学位授予标准认识有分歧,法院对案件的审查标准存在差异。厘清高校学位授予标准设定权的法律属性,并划定其行为的边界是修改《学位条例》的重要议题。学位授予权的“双重属性说”决定了高校学位授予标准设定权的权利属性,其外部特征表现为“办学自主”,其内部权利核心是“学术自由”。学术自由权的权利边界为高校设定学位授予标准权划定了权限范围。学位授予标准包括了学术标准和非学术标准,高校在学术自由的范围内自主增设学位授予的学术标准,法院坚持司法对学术的尊让,以不抵触原则低强度地审查学术标准,落实高校的学术自由权。高校依据法律设定非学术标准,不得增设其他非学术标准,法院依据法律保留原则高强度地审查非学术标准,防止学生权益受到侵害。  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号