首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test characteristic curve (i.e., the IRT true‐score (TS) estimator). The five methods are compared using a simulation study and a real data example. Results indicated that the application of different methods can sometimes lead to different estimated cut scores, and that there can be some key differences in impact data when using the IRT TS estimator compared to other methods. It is suggested that one should carefully think about their choice of methods to estimate ability and cut scores because different methods have distinct features and properties. An important consideration in the application of Bayesian methods relates to the choice of the prior and the potential bias that priors may introduce into estimates.  相似文献   

2.
This article provides an overview of the Hofstee standard‐setting method and illustrates several situations where the Hofstee method will produce undefined cut scores. The situations where the cut scores will be undefined involve cases where the line segment derived from the Hofstee ratings does not intersect the score distribution curve based on actual exam performance data. Data from 15 standard settings performed by a credentialing organization are used to investigate how common undefined cut scores are with the Hofstee method and to compare cut scores derived from the Hofstee method with those from the Beuk method. Results suggest that when Hofstee cut scores exist that the Hofstee and Beuk methods often yield fairly similar results. However, it is shown that undefined Hofstee cut scores did occur in a few situations. When Hofstee cut scores are undefined, it is suggested that one extend the Hofstee line segment so that it intersects the score distribution curve to estimate cut scores. Analyses show that extending the line segment to estimate cut scores often yields similar results to the Beuk method. The article concludes with a discussion of what these results may imply for people who want to employ the Hofstee method.  相似文献   

3.
This module describes some common standard-setting procedures used to derive performance levels for achievement tests in education, licensure, and certification. Upon completing the module, readers will be able to: describe what standard setting is; understand why standard setting is necessary; recognize some of the purposes of standard setting; calculate cut scores using various methods; and identify elements to be considered when evaluating standard-setting procedures. A self-test and annotated bibliography are provided at the end of the module. Teaching aids to accompany the module are available through NCME.  相似文献   

4.
Many educational testing programs report examinee performance at more than two levels of proficiency. Whether these assessments have the capacity to support these multiple inferences, though, is a topic that has not been widely discussed. This study proposes a method for evaluating the minimum number of measurement opportunities for reporting students' performance at multiple achievement levels and describes an application of the method for reading and mathematics assessments that are used by some school districts in Nebraska. Analyses were based on judgments collected from 110 teachers about characteristics of items and tasks from multiple assessments in reading and mathematics at grades 4 and 8, and in high school. Results suggested that there were generally enough items on the mathematics assessments to classify students into two or three performance levels, but rarely enough to make the four classifications that the state reported. Items on the reading assessments were generally distributed across the proficiency levels and tended to allow reporting for all four classification levels. These findings have implications for both practitioners and policymakers in how scores are interpreted.  相似文献   

5.
The Bookmark Standard-Setting Method: A Literature Review   总被引:1,自引:0,他引:1  
The Bookmark method for setting standards on educational tests is currently one of the most popular standard-setting methods. However, research to support the method is scarce. In this report, we review the published and unpublished literature on this method as well as some seminal work in the area of evaluating standard-setting studies. Our review highlights both strengths and limitations of the method. Strengths include its wide acceptance and panelist confidence in the method. Limitations include a potential bias to produce lower-than-intended standards and problems in selecting the most appropriate response probability value for ordering the items presented to panelists. It is clear that more research on this method is needed to support its wide use. Several areas for future research to better understand the validity of the Bookmark method for setting standards on educational tests are presented.  相似文献   

6.
States participating in the Growth Model Pilot Program reference individual student growth against “proficiency” cut scores that conform with the original No Child Left Behind Act (NCLB). Although achievement results from conventional NCLB models are also cut‐score dependent, the functional relationships between cut‐score location and growth results are more complex and are not currently well described. We apply cut‐score scenarios to longitudinal data to demonstrate the dependence of state‐ and school‐level growth results on cut‐score choice. This dependence is examined along three dimensions: 1) rigor, as states set cut scores largely at their discretion, 2) across‐grade articulation, as the rigor of proficiency standards may vary across grades, and 3) the time horizon chosen for growth to proficiency. Results show that the selection of plausible alternative cut scores within a growth model can change the percentage of students “on track to proficiency” by more than 20 percentage points and reverse accountability decisions for more than 40% of schools. We contribute a framework for predicting these dependencies, and we argue that the cut‐score dependence of large‐scale growth statistics must be made transparent, particularly for comparisons of growth results across states.  相似文献   

7.
This research evaluated the impact of a common modification to Angoff standard‐setting exercises: the provision of examinee performance data. Data from 18 independent standard‐setting panels across three different medical licensing examinations were examined to investigate whether and how the provision of performance information impacted judgments and the resulting cut scores. Results varied by panel but in general indicated that both the variability among the panelists and the resulting cut scores were affected by the data. After the review of performance data, panelist variability generally decreased. In addition, for all panels and examinations pre‐ and post‐data cut scores were significantly different. Investigation of the practical significance of the findings indicated that nontrivial fail rate changes were associated with the cut score changes for a majority of standard‐setting exercises. This study is the first to provide a large‐scale, systematic evaluation of the impact of a common standard setting practice, and the results can provide practitioners with insight into how the practice influences panelist variability and resulting cut scores.  相似文献   

8.
Historically, Angoff‐based methods were used to establish cut scores on the National Assessment of Educational Progress (NAEP). In 2005, the National Assessment Governing Board oversaw multiple studies aimed at evaluating the reliability and validity of Bookmark‐based methods via a comparison to Angoff‐based methods. As the Board considered adoption of Bookmark‐based methods, it considered several criteria, including reliability of the cut scores, validity of the cut scores as evidenced by comparability of results to those from Angoff, and procedural validity as evidenced by panelist understanding of the method tasks and instructions and confidence in the results. As a result of their review, a Bookmark‐based method was adopted for NAEP, and has been used since that time. This article goes beyond the Governing Board's initial evaluations to conduct a systematic review of 27 studies in NAEP research conducted over 15 years. This research is used to evaluate Bookmark‐based methods on key criteria originally considered by the Governing Board. Findings suggest that Bookmark‐based methods have comparable reliability, resulting cut scores, and panelist evaluations to Angoff. Given that Bookmark‐based methods are shorter in duration and less costly, Bookmark‐based methods may be preferable to Angoff for NAEP standard setting.  相似文献   

9.
Substantial growth in the numbers of English language learners (ELLs) in the United States and Canada in recent years has significantly affected the educational systems of both countries. This article focuses on critical issues and concerns related to the assessment of ELLs in U.S. and Canadian schools and emphasizes assessment approaches for test developers and decision makers that will facilitate increased equity, meaningfulness, and accuracy in assessment and accountability efforts. It begins by examining the crucial issue of defining ELLs as a group. Next, it examines the impact of testing originating from the No Child Left Behind Act of 2001 (NCLB) in the U.S. and government‐mandated standards‐driven testing in Canada by briefly describing each country's respective legislated testing requirements and outlining their consequences at several levels. Finally, the authors identify key points that test developers and decision makers in both contexts should consider in testing this ever‐increasing group of students.  相似文献   

10.
One commonly used compromise standard-setting method is the Beuk (1984) method. A key assumption of the Beuk method is that the emphasis given to the pass rate and the percent correct ratings should be proportional to the extent that the panelists agree on their ratings. However, whether the slope of Beuk line reflects the emphasis that panelists believe should be assigned to the pass rate and the percentage correct ratings has not be fully tested. In this article, I evaluate this critical assumption of the Beuk method by asking panelists to assign importance weights to their percentage correct and pass rate judgments. I show that in several cases that the emphasis suggested by the Beuk slope is noticeably different from what one would expect and is inconsistent with importance weight ratings. I also suggest two ways that the importance weights can be used to calculate alternate cut scores, and I show that one of the ways of calculating cut scores using the importance weights leads to larger potential differences in cut score estimates. I suggest that practitioners should consider collecting importance weights when the Beuk method is used for determining cut scores.  相似文献   

11.
Different standard-setting procedures usually produce different cut points even if each has a rational basis. In 2000, three standard-setting procedures were implemented to set cut scores in each of the 18 grade/content areas comprising Kentucky's state assessment system: the Contrasting Groups, Bookmark, and Jaeger-Mills procedures. Subsequently, participants from each of the three procedures worked together in each grade/content area to synthesize the results. These synthesis participants considered the results of, and examined the materials and information provided by, each of the three separate procedures. In this article the synthesis processes are described and discussed.  相似文献   

12.
Cut‐scores were set by expert judges on assessments of reading and listening comprehension of English as a foreign language (EFL), using the bookmark standard‐setting method to differentiate proficiency levels defined by the Common European Framework of Reference (CEFR). Assessments contained stratified item samples drawn from extensive item pools, calibrated using Rasch models on the basis of examinee responses of a German nationwide assessment of secondary school language performance. The results suggest significant effects of item sampling strategies for the bookmark method on cut‐score recommendations, as well as significant cut‐score judgment revision over cut‐score placement rounds. Results are discussed within a framework of establishing validity evidence supporting cut‐score recommendations using the widely employed bookmark method.  相似文献   

13.
对HSK部分等级的验证性研究   总被引:1,自引:0,他引:1  
中国汉语水平考试(HSK)的作用之一是界定留学生进入中国大学入系学习时所应具备的汉语能力。根据有关规定,HSK三级和六级分别是进入中国大学理工西医类和文史中医类入系学习的最低标准。本文采用安哥夫、边缘组及对照组三种方法对此标准进行了验证性研究。  相似文献   

14.
The purpose of this study was to illustrate the use of propensity scores for creating comparison groups, partially controlling for pretreatment course selection bias, and estimating the treatment effects of selected courses on the development of moral reasoning in undergraduate students. Specifically, we used a sample of convenience for comparing differences in moral reasoning development scores among students enrolled in intergroup dialogue, service learning, psychology and philosophy courses with those of an introductory sociology course. Adopting a propensity score approach included reviewing the empirical literature for its guidance in substantiating the reasons for including pretreatment variables (i.e., pretreatment course-taking behaviors, race, sex, political identification, need for cognition, major, age, pretreatment moral reasoning scores) in our analysis, measuring these variables, and reducing them into a single composite propensity score for each student in our analytic sample. This score then served as the basis for creating a new comparison group and for allowing us to estimate unbiased (or less biased) course-related treatment effects on moral reasoning development. Implications for higher education researchers are discussed.
Matthew J. Mayhew (Corresponding author)Email:
  相似文献   

15.
Abstract

This article presents research about school counselors' attitudes toward breaching confidentiality that the authors conducted immediately before and after the tragic shootings at Columbine High School in April 1999. Two groups of school counselors were demographically similar but differed significantly in their predictions as to whether they would breach confidentiality and in their attitudes toward certain aspects of school counselor practice such as informed consent. School counselors at all levels of employment reported that they were less likely to breach confidentiality after the highly publicized high school shootings and that they were more responsible to their minor clients than to the parents of those clients. Implications for policy makers are discussed.  相似文献   

16.
高中生教育补习支出:影响因素及政策启示   总被引:13,自引:0,他引:13  
应用高中生调查数据,本研究探讨了中国高中生教育补习支出及其影响因素。研究显示:中国背景下教育补习主要是“补差”,即教育补习主要帮助成绩落后学生提高成绩;地区背景、城乡背景和家庭社会经济地位等因素对教育补习支出有显影响。有关教育补习的政策应考虑教育补习合法性并力图减少教育补习活动对社会弱势群体的不利影响。  相似文献   

17.
A new probability-based standard setting technique, the Objective Borderline Method (OBM), was introduced recently. This was based on a mathematical model of how test scores relate to student ability. The present study refined the model and tested it using 2500 simulated data-sets. The OBM was feasible to use. On average, the OBM performed well with specificity .88, sensitivity .51, false positive rate 3.4% and false negative rate 26%. These indices were insensitive to the borderline score range. This probability-based standard setting may be a useful addition to the range of standard setting methods available.  相似文献   

18.
Motivation is a key factor in promoting academic success, and intrinsic motivation is especially important for developing autonomous learners. Reluctant learners, in particular, benefit from intrinsic motivation that makes learning relevant to their lives. In this article, the author describes commonalities of reluctant learners and presents definitions and frameworks for understanding motivation. The author also suggests a variety of strategies and activities for turning reluctant learners into inspired learners.  相似文献   

19.
A look at real data shows that Reckase's psychometric theory for standard setting is not applicable to bookmark and that his simulations cannot explain actual differences between methods. It is suggested that exclusively test-centered, criterion-referenced approaches are too idealized and that a psychophysics paradigm and a theory of group behavior could be more useful in thinking about the standard setting process. In this view, item mapping methods such as bookmark are reasonable adaptations to fundamental limitations in human judgments of item difficulty. They make item ratings unnecessary and have unique potential for integrating external validity data and student performance data more fully into the standard setting process.  相似文献   

20.
In this digital ITEMS module, Dr. Michael Bunch provides an in-depth, step-by-step look at how standard setting is done. It does not focus on any specific procedure or methodology (e.g., modified Angoff, bookmark, and body of work) but on the practical tasks that must be completed for any standard setting activity. Dr. Bunch carries the participant through every stage of the standard setting process, from developing a plan, through preparations for standard setting, conducting standard setting, and all the follow-up activities that must occur after standard setting in order to obtain the approval of cut scores and translate those cut scores into score reports. The digital module includes a 120-page manual, various ancillary files (e.g., PowerPoint slides, Excel workbooks, sample documents, and forms), links to datasets from the book Standard Setting (Cizek & Bunch, 2007), links to final reports from four recent large-scale standard setting events, quiz questions with formative feedback, and a glossary.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号