首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can lead to questionable results in setting cut scores. In this study, data collected from a previous standard-setting study are used to deduce panelists’ conceptions of profiles of borderline performance. These profiles are then used to predict cut scores on a test of algebra readiness. The results indicate that these profiles can predict a very wide range of cut scores both within and between panelists. Modifications are proposed to existing training procedures for test-centered methods that can account for the variation in borderline profiles.  相似文献   

2.
Standard-setting studies utilizing procedures such as the Bookmark or Angoff methods are just one component of the complete standard-setting process. Decision makers ultimately must determine what they believe to be the most appropriate standard or cut score to use, employing the input of the standard-setting panelists as one piece of information among multiple sources. However, guidance for weighing the various components is limited. The current article describes considerations about data that are used to make standard-setting decisions, as previously outlined by Geisinger (1991) . The ten points provided by Geisinger have been expanded as they relate to shifts in educational policy and practice in educational measurement. They have been amended with six new components as well. The new considerations addressed are smoothing across grades, raising standards in progression (over grades or over time), opportunity to learn or instructional validity, input from other groups, equating or linking to previous standards, and organizational vision and goals .  相似文献   

3.
The Bookmark Standard-Setting Method: A Literature Review   总被引:1,自引:0,他引:1  
The Bookmark method for setting standards on educational tests is currently one of the most popular standard-setting methods. However, research to support the method is scarce. In this report, we review the published and unpublished literature on this method as well as some seminal work in the area of evaluating standard-setting studies. Our review highlights both strengths and limitations of the method. Strengths include its wide acceptance and panelist confidence in the method. Limitations include a potential bias to produce lower-than-intended standards and problems in selecting the most appropriate response probability value for ordering the items presented to panelists. It is clear that more research on this method is needed to support its wide use. Several areas for future research to better understand the validity of the Bookmark method for setting standards on educational tests are presented.  相似文献   

4.
5.
《Educational Assessment》2013,18(3):197-215
Results of variations on a categorical standard-setting procedure for use with performance assessments with multiple performance standards are reported. Panelists were asked to make independent ordinal categorical assignments of student work. Student work came from the administration of 1 of the 1996 Grade 8 National Assessment of Educational Progress Science Booklets. This study focused on the comparability of cutscores using 2 data analysis strategies, the effects of 2 strategies for making the categorical classifications (sorting vs. direct classification), the efficacy of long and short versions of the classification scale, and the effects of discussion on the basic, proficient, and advanced cutscores. The results indicate that there were minimal differences in cutscores across the 2 data analysis strategies. The direct classification approach was more feasible than the sorting approach, at least with the design as it was implemented in this study. The short version of the classification scale yielded comparable results to the longer version and took less time (about 20%) for the panelists to complete. Discussion produced the expected effect of more agreement among panelists. These results should be interpreted with caution due to the small sample sizes.  相似文献   

6.
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark method. This article presents results from three experiments where panelists were asked to give Bookmark-type ratings to separate items into groups based on item difficulty data. Results of the experiments showed, consistent with results often observed with the Angoff method, that panelists typically and paradoxically perceived hard items to be too easy and easy items to be too hard. These perceptions were reflected in panelists often placing their Bookmarks too early for hard items and often placing their Bookmarks too late for easy items. The article concludes with a discussion of what these results imply for educators and policymakers using the Bookmark standard-setting method.  相似文献   

7.
An important consideration in standard setting is recruiting a group of panelists with different experiences and backgrounds to serve on the standard-setting panel. This study uses data from 14 different Angoff standard settings from a variety of medical imaging credentialing programs to examine whether people with different professional roles and test development experiences tended to recommend higher or lower cut scores or were more or less accurate in their standard-setting judgments. Results suggested that there were not any statistically significant differences for different types of panelists in terms of the cut scores they recommended or the accuracy of their judgments. Discussion of what these results may mean for panelist selection and recruitment is provided.  相似文献   

8.
Standard-setting procedures are a key component within many large-scale educational assessment systems. They are consensual approaches in which committees of experts set cut-scores on continuous proficiency scales, which facilitate communication of proficiency distributions of students to a wide variety of stakeholders. This communicative function makes standard-setting studies a key gateway for validity concerns at the intersection of evidentiary and consequential aspects of score interpretations. This short review paper describes the conceptual and empirical basis of validity arguments for standard-setting procedures in light of recent research on validity theory. It specifically demonstrates how procedural and internal evidence for the validity of standard-setting procedures can be collected to form part of the consequential basis of validity evidence for test use.  相似文献   

9.
Standard setting methods, like the Bookmark procedure, are used to assist education experts in formulating performance standards. Small group discussion is meant to help these experts in setting more reliable and valid cutoff scores. This study is an analysis of 15 small group discussions during two standards setting trajectories and their effect on the cutoff scores on four performance levels for comprehensive reading and mathematics. Discussion decreased the variability of the cutoff scores among the expert panelists, but the direction of the adaptations varied among groups. Furthermore, also the duration and the content of the audio-taped discussions differed among groups. There was no relationship between the increase in agreement among the panelists and the duration of their discussions or their use of arguments concerning learning content. It was concluded that an increased consensus among panelists alone does not provide enough information on the reliability and validity of cutoff scores. Additional measures aimed at the content of group discussions have appeared to be necessary, since the use of content arguments in these discussions is not guaranteed.  相似文献   

10.
《Educational Assessment》2013,18(2):129-153
States are increasingly using test scores as part of the requirements for high school graduation or certification. In these circumstances, a battery of tests or, with writing, analytic traits are considered that usually cover different aspects of the state's content standards. Because pass or fail decisions are made affecting students' futures, the validity of standard-setting procedures and strategies is a major concern. Policymakers and legislators must decide which of these 2 standard-setting strategies to use for making pass or fail decisions for students seeking certification or for meeting a high school graduation requirement. The compensatory strategy focuses on total performance, summing scores across all tests in the battery. The conjunctive strategy requires passing performance for each test in the battery. This article reviews and evaluates compensatory and conjunctive standard-setting strategies. The rationales for each type are presented and discussed. Results from a study comparing the compensatory and conjunctive strategies for a state high school certification writing test provide insight into the problem of choosing either strategy. This article concludes with a set of recommendations for those who must decide which type of standard-setting strategy to use.  相似文献   

11.
Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the resulting judgments. This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores. The results showed substantially different cut scores for the two conditions; this calls into question whether content experts can produce the type of internally consistent judgments that are required using the bookmark procedure.  相似文献   

12.
Setting motor performance standards has long been a process of interest to physical educators. Theoretical advances in the measurement technology appropriate for standard-setting, however, have occurred only in the last decade. The first portion of this paper is devoted to a discussion of issues in setting standards and a brief review of procedures for standard-setting. In the latter section, gender differences in motor performance are examined and the impact of these differences on standard-setting is considered.  相似文献   

13.
During the development of large‐scale curricular achievement tests, recruited panels of independent subject‐matter experts use systematic judgmental methods—often collectively labeled “alignment” methods—to rate the correspondence between a given test's items and the objective statements in a particular curricular standards document. High disagreement among the expert panelists may indicate problems with training, feedback, or other steps of the alignment procedure. Existing procedural recommendations for alignment reviews have been derived largely from single‐panel research studies; support for their use during operational large‐scale test development may be limited. Synthesizing data from more than 1,000 alignment reviews of state achievement tests, this study identifies features of test–standards alignment review procedures that impact agreement about test item content. The researchers then use their meta‐regression results to propose some practical suggestions for alignment review implementation.  相似文献   

14.
This article reports on the differential effectiveness of a teacher professional development programme for teachers in urban and rural schools in Indonesia. The study employed an embedded mixed methods design that involved the concurrent collection of both quantitative and qualitative data. The quantitative component involved a pre–post design in which two surveys were administered to a sample of 2417 students drawn from 66 classes in 32 lower secondary schools (960 from urban schools and 1457 from rural schools). The qualitative component involved six case study teachers and two students from each of their classes. Qualitative information was gathered using teacher and student interviews, classroom observations and teacher reflective journals. The quantitative results suggested that there were disparities between the usefulness of the knowledge and skills imparted during the programme for teachers in urban and rural schools. The themes that emerged from the data gathered using qualitative methods helped to make sense of the differences in student scores in urban and rural schools before and after the teacher professional development programme.  相似文献   

15.
The Angoff (1971) standard setting method requires expert panelists to (a) conceptualize candidates who possess the qualifications of interest (e.g., the minimally qualified) and (b) estimate actual item performance for these candidates. Past and current research (Bejar, 1983; Shepard, 1994) suggests that estimating item performance is difficult for panelists. If panelists cannot perform this task, the validity of the standard based on these estimates is in question. This study tested the ability of 26 classroom teachers to estimate item performance for two groups of their students on a locally developed district-wide science test. Teachers were more accurate in estimating the performance of the total group than of the "borderline group," but in neither case was their accuracy level high. Implications of this finding for the validity of item performance estimates by panelists using the Angoff standard setting method are discussed.  相似文献   

16.
In standards-based accountability programs, test scores are interpreted with reference to cut scores that establish categories like "proficient" or "below basic." The meaning of these cut scores is set forth in their associated "performance standards." Validity arguments for such interpretations require both a criterion-referenced score scale and a legitimate exercise of authority by those who set the standards. Stakeholder participation in a rational and coherent deliberative process is necessary to assure that these conditions are satisfied. This article sets forth a framework for the required validity argument and suggests possible ways to enable such participation. A new standard-setting method, the "briefing book" method, is suggested for possible study.  相似文献   

17.
This paper describes how a state education system in Australia introduced standards-referenced assessments into its large-scale, high-stakes, curriculum-based examinations in a way that enables comparison of performance across time even though the examinations are different each year. It describes the multi-stage modified Angoff standard-setting procedure used to establish cut-off scores on subject examinations, and how the results from this exercise were then used to develop standards packages. These packages illustrate the performances of students at the borders between the various bands.

The paper also shows how originally it was intended to use a Rasch measurement model to create the statistical feedback used in the standard-setting procedure. It also describes the modifications to the feedback that were necessary to meet the real-time constraints of this large-scale examination programme. It argues that consideration should now be given to using the Rasch model to provide this feedback instead of the current approach.  相似文献   


18.
Angoff-based standard setting is widely used, especially for high-stakes licensure assessments. Nonetheless, some critics have claimed that the judgment task is too cognitively complex for panelists, whereas others have explicitly challenged the consistency in (replicability of) standard-setting outcomes. Evidence of consistency in item judgments and passing scores is necessary to justify using the passing scores for consequential decisions. Few studies, however, have directly evaluated consistency across different standard-setting panels. The purpose of this study was to investigate consistency of Angoff-based standard-setting judgments and passing scores across 9 different educator licensure assessments. Two independent, multistate panels of educators were formed to recommend the passing score for each assessment, with each panel engaging in 2 rounds of judgments. Multiple measures of consistency were applied to each round of judgments. The results provide positive evidence of the consistency in judgments and passing scores.  相似文献   

19.
Some writers in the measurement literature have been skeptical of the meaningfulness of achievement standards and described the standard-setting process as blatantly arbitrary. We argue that standard setting is more appropriately conceived of as a measurement process similar to student assessment. The construct being measured is the panelists' representation of student performance at the threshold of an achievement level. In the first section of this paper, we argue that standard setting is an example of stimulus-centered measurement. In the second section, we elaborate on this idea by comparing some popular standard-setting methods to the stimulus-centered scaling methods known as psychophysical scaling. In the third section, we use the lens of standard setting as a measurement process to take a fresh look at the two criticisms of standard setting: the role of judgment and the variability of results. In the fourth section, we offer a vision of standard-setting research and practice as grounded in the theory and practice of educational measurement .  相似文献   

20.
ABSTRACT

This article draws on three assessment paradigms – psychometrics, outcomes-based and curriculum-based assessment – to discuss paradigmatic changes in senior school assessment and achievement standard-setting in Queensland, Australia, over the last 50 years. These include radical reforms in 1970 from university-controlled examinations to school-based assessments applying normative standard-setting, to subsequent reforms in 1978 introducing competence(curriculum)-based assessment and standards. From 2019, a new reform introduces a combination of school-based and external assessment with procedures for establishing standards still in progress.

Changes to Queensland assessment and standard-setting are discussed in terms of three preconditions for paradigm change – dissatisfaction, an alternative acceptable paradigm, and majority acceptance of change. Influence of paradigmatic origins of reformers is discussed. The amalgam of curriculum-based assessment and psychometric paradigms in the new Queensland system is considered in terms of theoretical compatibility and potential impact on the new standards.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号