首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Decisions about progress through an academic programme are made by Boards of Examiners, on the basis of students’ course assessments. For most students such pass/fail grading decisions are straightforward. However, for those students whose results are borderline (either at a pass/fail boundary or boundaries between grades) the exercise of some discretion by university staff is required. In the interests of the transparency of the exercise of this discretion and to increase the chances that the ‘right’ decision is made, we tested the validity of the second version of the Objective Borderline Method (OBM2) decision-making tool in a medical programme. Our results suggest that application of OBM2 provides valid data to help university staff make robust decisions about a student’s progression through a programme, and with which to defend these decisions if that should be required.  相似文献   

2.
In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can lead to questionable results in setting cut scores. In this study, data collected from a previous standard-setting study are used to deduce panelists’ conceptions of profiles of borderline performance. These profiles are then used to predict cut scores on a test of algebra readiness. The results indicate that these profiles can predict a very wide range of cut scores both within and between panelists. Modifications are proposed to existing training procedures for test-centered methods that can account for the variation in borderline profiles.  相似文献   

3.
Institutions of higher education commonly employ a conjunctive standard setting strategy, which requires students to resit failed examinations until they pass all tests. An alternative strategy allows students to compensate a failing grade with other test results. This paper uses regression discontinuity design to compare the effect of first-year resits and compensations on second-year study results. We select students with a similar level of knowledge in a first-year introductory course and estimate the treatment effect of a resit on the result for a second-year intermediate course in the same subject. We find that the treatment effect is positive, but insignificantly different from zero. Additional results show that students’ overall second-year performance is insignificantly related to the number of compensated failing grades in their first year. The number of attempts that students need to complete their first year does not have a significant positive effect on second-year performance. We conclude that the evidence for a positive effect of resits on learning is weak at best.  相似文献   

4.
森林旅游地空气负离子评价标准的研究   总被引:1,自引:0,他引:1  
随着森林生态旅游的兴起及人们保健意识的增强,空气负离子作为一种重要的森林旅游资源已越来越受到人们的重视。本文分析研究了在森林环境中实测的大量数据,并采用标准对数正态变换法,制定出森林旅游区空气负离子的分级评价标准,以此为依据,对北京小龙门森林公园及广州流溪河国家森林公园主要景区空气负离子状况进行了评价。  相似文献   

5.
Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the resulting judgments. This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores. The results showed substantially different cut scores for the two conditions; this calls into question whether content experts can produce the type of internally consistent judgments that are required using the bookmark procedure.  相似文献   

6.
Some writers in the measurement literature have been skeptical of the meaningfulness of achievement standards and described the standard-setting process as blatantly arbitrary. We argue that standard setting is more appropriately conceived of as a measurement process similar to student assessment. The construct being measured is the panelists' representation of student performance at the threshold of an achievement level. In the first section of this paper, we argue that standard setting is an example of stimulus-centered measurement. In the second section, we elaborate on this idea by comparing some popular standard-setting methods to the stimulus-centered scaling methods known as psychophysical scaling. In the third section, we use the lens of standard setting as a measurement process to take a fresh look at the two criticisms of standard setting: the role of judgment and the variability of results. In the fourth section, we offer a vision of standard-setting research and practice as grounded in the theory and practice of educational measurement .  相似文献   

7.
8.
    
This research evaluated the impact of a common modification to Angoff standard‐setting exercises: the provision of examinee performance data. Data from 18 independent standard‐setting panels across three different medical licensing examinations were examined to investigate whether and how the provision of performance information impacted judgments and the resulting cut scores. Results varied by panel but in general indicated that both the variability among the panelists and the resulting cut scores were affected by the data. After the review of performance data, panelist variability generally decreased. In addition, for all panels and examinations pre‐ and post‐data cut scores were significantly different. Investigation of the practical significance of the findings indicated that nontrivial fail rate changes were associated with the cut score changes for a majority of standard‐setting exercises. This study is the first to provide a large‐scale, systematic evaluation of the impact of a common standard setting practice, and the results can provide practitioners with insight into how the practice influences panelist variability and resulting cut scores.  相似文献   

9.
10.
向冠春 《成人教育》2013,33(1):14-20
标准设定在教育测量领域是一个相当重要的议题,它涉及面十分广泛、备受人们争议,解决起来非常棘手。为了解决此难题,国外涌现出大量标准设定方面的理论研究和实践应用。我们在标准设定方面的研究还比较欠缺,文章归纳各种方法进行标准设定的步骤、介绍一些经典的标准设定方法以及剑桥评价在进行等级划分时的运用,以期对我们的标准设定实践有所助益,增加考试的信度。  相似文献   

11.
Many educational testing programs report examinee performance at more than two levels of proficiency. Whether these assessments have the capacity to support these multiple inferences, though, is a topic that has not been widely discussed. This study proposes a method for evaluating the minimum number of measurement opportunities for reporting students' performance at multiple achievement levels and describes an application of the method for reading and mathematics assessments that are used by some school districts in Nebraska. Analyses were based on judgments collected from 110 teachers about characteristics of items and tasks from multiple assessments in reading and mathematics at grades 4 and 8, and in high school. Results suggested that there were generally enough items on the mathematics assessments to classify students into two or three performance levels, but rarely enough to make the four classifications that the state reported. Items on the reading assessments were generally distributed across the proficiency levels and tended to allow reporting for all four classification levels. These findings have implications for both practitioners and policymakers in how scores are interpreted.  相似文献   

12.
    
Standard setting is arguably one of the most subjective techniques in test development and psychometrics. The decisions when scores are compared to standards, however, are arguably the most consequential outcomes of testing. Providing licensure to practice in a profession has high stake consequences for the public. Denying graduation or forcing remediation has high-impact consequences for students. Unfortunately, tests that classify individuals are subjected to false positive and false negative misclassifications. When determining a standard, standard setting panelists implicitly consider the negative consequences of the decisions made from test use. We propose the conscious weight method and subconscious weight method to bring more objectivity to the standard setting process. To do this, these methods quantify the relative harm of the negative consequences of false positive and false negative misclassification.  相似文献   

13.
In this article we address the issue of consistency in standard setting in the context of an augmented state testing program. Information gained from the external NRT scores is used to help make an informed decision on the determination of cut scores on the state test. The consistency of cut scores on the CRT across grades is maintained by forcing a consistency model based on the NRT scores and translating that information back to the CRT scores. The inconsistency of standards and the application of this model are illustrated using data from the Maryland MSA large state testing program involving cut points for basic, proficient and advanced in mathematics and reading across years and across grades. The model is discussed in some detail and shown to be a promising approach, although not without assumptions that must be made and issues that might be raised.  相似文献   

14.
15.
States participating in the Growth Model Pilot Program reference individual student growth against “proficiency” cut scores that conform with the original No Child Left Behind Act (NCLB). Although achievement results from conventional NCLB models are also cut‐score dependent, the functional relationships between cut‐score location and growth results are more complex and are not currently well described. We apply cut‐score scenarios to longitudinal data to demonstrate the dependence of state‐ and school‐level growth results on cut‐score choice. This dependence is examined along three dimensions: 1) rigor, as states set cut scores largely at their discretion, 2) across‐grade articulation, as the rigor of proficiency standards may vary across grades, and 3) the time horizon chosen for growth to proficiency. Results show that the selection of plausible alternative cut scores within a growth model can change the percentage of students “on track to proficiency” by more than 20 percentage points and reverse accountability decisions for more than 40% of schools. We contribute a framework for predicting these dependencies, and we argue that the cut‐score dependence of large‐scale growth statistics must be made transparent, particularly for comparisons of growth results across states.  相似文献   

16.
本文采用调查问卷的方式,对某中学高二年级的学生进行了一次有关学习策略的抽样调查,并结合学生的期中考试成绩,运用定量研究的方法,对调查结果进行了相关分析。在此基础上,进一步计算出学生对各学习策略的了解和使用程度的平均数。结果发现学习策略的正确使用与成绩之间高度正相关,但学生在学习策略的了解和运用方面,还存在许多不足。这应在英语教学中引起重视,并采取适当措施,以帮助学生形成正确的学习策略。  相似文献   

17.
课内外相结合引导低年级学生自主研学   总被引:1,自引:0,他引:1  
引导学生进行研究性学习对培养学生的问题意识、研究能力和创新精神有着重要的作用。东南大学物理实验中心通过多种形式,分层次激励和引导广大低年级学生由课内延伸至课外进行研究性学习。实践证明在现有的教学资源条件下,依托大学低年级基础课程开展自主创新研学的引导活动是可行并有效的。  相似文献   

18.
    
This module describes some common standard-setting procedures used to derive performance levels for achievement tests in education, licensure, and certification. Upon completing the module, readers will be able to: describe what standard setting is; understand why standard setting is necessary; recognize some of the purposes of standard setting; calculate cut scores using various methods; and identify elements to be considered when evaluating standard-setting procedures. A self-test and annotated bibliography are provided at the end of the module. Teaching aids to accompany the module are available through NCME.  相似文献   

19.
Standard Setters: Stand Up and Take a Stand!   总被引:1,自引:0,他引:1  
This 2006 NCME Career Award Address presents observations made over a career in the field of standard setting, summarizes the status of current research, and makes recommendations for future research.  相似文献   

20.
    
Evidence to support the credibility of standard setting procedures is a critical part of the validity argument for decisions made based on tests that are used for classification. One area in which there has been limited empirical study is the impact of standard setting judge selection on the resulting cut score. One important issue related to judge selection is whether the extent of judges’ content knowledge impacts their perceptions of the probability that a minimally proficient examinee will answer the item correctly. The present article reports on two studies conducted in the context of Angoff‐style standard setting for medical licensing examinations. In the first study, content experts answered and subsequently provided Angoff judgments for a set of test items. After accounting for perceived item difficulty and judge stringency, answering the item correctly accounted for a significant (and potentially important) impact on expert judgment. The second study examined whether providing the correct answer to the judges would result in a similar effect to that associated with knowing the correct answer. The results suggested that providing the correct answer did not impact judgments. These results have important implications for the validity of standard setting outcomes in general and on judge recruitment specifically.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号