首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
What factors influence judges when they set standards? How do judges, test questions, and the standard-setting process interact? How can we improve intrajudge consistency?  相似文献   

2.
Who should make judgments about test standards? Who is an expert? How many judges should be used in a standard-setting study? What is the relationship between the number of judges and the standard error of the test?  相似文献   

3.
Is training judges beyond initial orientation required? How can we help judges apply their conceptualization of minimal competence to individual items?  相似文献   

4.
Standard-Setting Guidelines   总被引:1,自引:0,他引:1  
Do we need guidelines for standard-setting studies? What should the guidelines address? What additional research on standard-setting is needed?  相似文献   

5.
《教育实用测度》2013,26(1):67-78
Sixty-one judges provided recommendations on minimal standards for the Essay portion of the National Teacher Examinations Communication Skills Test, which is used to screen applicants to North Carolina teacher education programs. The standard-setting procedures were described; provision of student performance information to judges and group discussion significantly increased the average recommended standards. Initial differences between the average recommended standards when two sets of essays were used by judges were diminished by those treatments. The recommendations of public school judges were significantly more variable than were those of college and university judges following discussion.  相似文献   

6.
What is judgmental policy capturing? How can it be applied to standard setting with performance assessments? Do judges value some exercises more in setting standards?  相似文献   

7.
Cut scores, estimated using the Angoff procedure, are routinely used to make high-stakes classification decisions based on examinee scores. Precision is necessary in estimation of cut scores because of the importance of these decisions. Although much has been written about how these procedures should be implemented, there is relatively little literature providing empirical support for specific approaches to providing training and feedback to standard-setting judges. This article presents a multivariate generalizability analysis designed to examine the impact of training and feedback on various sources of error in estimation of cut scores for a standard-setting procedure in which multiple independent groups completed the judgments. The results indicate that after training, there was little improvement in the ability of judges to rank order items by difficulty but there was a substantial improvement in inter-judge consistency in centering ratings. The results also show a substantial group effect. Consistent with this result, the direction of change for the estimated cut score was shown to be group dependent.  相似文献   

8.
Is it enough to run the standard-setting panel and to stop? Once a panel has met or a contrasting group experiment has been conducted, do we have a cutoff score? On what basis might a proposed cutoff score be adjusted? Who should make the adjustment decision?  相似文献   

9.
There are few empirical investigations of the consequences of using widely recommended data collection procedures in conjunction with a specific standardsetting method such as the Angoff (1971) procedure. Such recommendations include the use of several types of judges, the provision of normative information on examinees' test performance, and the opportunity to discuss and reconsider initial recommendations in an iterative standard-setting procedure. This study of 236 expert judges investigated the effects of using these recommended procedures on (a) average recommended test standards, (b) the variability of recommended test standards, and (c) the reliability of recommended standards for seven subtests of the National Teacher Examinations Communication Skills and General Knowledge Tests. Small, but sometimes statistically significant, changes in mean recommended test standards were observed when judges were allowed to reconsider their initial recommendations following review of normative information and discussion. Means for public school judges changed more than did those for college or university judges. In addition, there was a significant reduction in the within-group variability of standards recommended for several subtests. Methods for estimating the reliability of recommended test standards proposed by Kane and Wilson (1984) were applied, and their hypothesis of positive covariation between empirical item difficulties and mean recommended standards was confirmed. The data collection procedures examined in this study resulted in substantial increases in the reliability of recommended test standards.  相似文献   

10.
我国法官素质已经引起了社会各界广泛的关注。如何使我国法官成为法律职业群体中的精英,是一个非常重要的问题。必要和经常性的法官考评制度应当逐步健全,这有利于激励法官在审判工作中充分施展自己的才干,为法官个人或群体提供需要改进或变革的重要信息。合理的法官考评机制将有利于推进法官职业化建设。  相似文献   

11.
如何看待组织、介绍同性卖淫的行为?笔者认为,刑事司法个案的法律适用当中包含着丰富的刑法法理思想,就组织、介绍同性卖淫的行为的定性而言,就包含着刑法解释的目标问题和刑法适用的解释机制问题的理论运用,在阐述了这两个理论问题之后,现就组织、介绍同性卖淫的行为结合上述理论进行的定性分析认为:卖淫应当包括同性间的行为,法官应当丰富法律的内涵,渐进地发展法律。  相似文献   

12.
脑科学研究与学生素质培养   总被引:1,自引:0,他引:1  
脑科学研究成果如何应用于教育改革以提高学生素质?如研究脑科学中关于人语言中枢发展的关键期与可塑性如何用于儿童语言的早期教育;用脑科学的研究成果用于教育中如何改善学生们的记忆问题;研究脑科学中关于“敏感化”用于培养学生们创造性的“直觉”思维问题;研究大脑神经最重要的对多重记忆的整合功能,用于培养学生们德、智、体、美全面发展问题。  相似文献   

13.
Judgmental standard-setting methods, such as the Angoff(1971) method, use item performance estimates as the basis for determining the minimum passing score (MPS). Therefore, the accuracy, of these item peformance estimates is crucial to the validity of the resulting MPS. Recent researchers (Shepard, 1995; Impara & Plake, 1998; National Research Council. 1999) have called into question the ability of judges to make accurate item performance estimates for target subgroups of candidates, such as minimally competent candidates. The propose of this study was to examine the intra- and inter-rater consistency of item performance estimates from an Angoff standard setting. Results provide evidence that item pelformance estimates were consistent within and across panels within and across years. Factors that might have influenced this high degree of reliability, in the item performance estimates in a standard setting study are discussed.  相似文献   

14.
The purpose of this study was to determine if a linear procedure, typically applied to an entire examination when equating scores and reseating judges' standards, could be used with individual item data gathered through Angoffs standard-setting method (1971). Specifically, experts estimates of borderline group performance on one form of a test were transformed to be on the same scale as experts' estimates of borderline group performance on another form of the test. The transformations were based on examinees' responses to the items and on judges' estimates of borderline group performance. The transformed values were compared to the actual estimates provided by a group of judges. The equated and reseated values were reasonably close to those actually assigned by the experts. Bias in the estimates was also relatively small. In general, the reseating procedure was more accurate than the equating procedure, especially when the examinee sample size for equating was small.  相似文献   

15.
Some writers in the measurement literature have been skeptical of the meaningfulness of achievement standards and described the standard-setting process as blatantly arbitrary. We argue that standard setting is more appropriately conceived of as a measurement process similar to student assessment. The construct being measured is the panelists' representation of student performance at the threshold of an achievement level. In the first section of this paper, we argue that standard setting is an example of stimulus-centered measurement. In the second section, we elaborate on this idea by comparing some popular standard-setting methods to the stimulus-centered scaling methods known as psychophysical scaling. In the third section, we use the lens of standard setting as a measurement process to take a fresh look at the two criticisms of standard setting: the role of judgment and the variability of results. In the fourth section, we offer a vision of standard-setting research and practice as grounded in the theory and practice of educational measurement .  相似文献   

16.
Portfolios, Accountability, and an Interpretive Approach to Validity   总被引:1,自引:0,他引:1  
How can the results of classroom-based portfolio assessment be communicated outside the classroom? How might a portfolio-based assessment system be designed and implemented? How can we evaluate the merits of portfolio-based assessments?  相似文献   

17.
How can the enterprise of looking at the consequences of testing in America be moved forward? What are the responsibilities of the key actors? How can we accumulate the evidence that we need?  相似文献   

18.
How can the enterprise of looking at the consequences of testing in America be moved forward? What are the responsibilities of the key actors? How can we accumulate the evidence that we need?  相似文献   

19.
This paper addresses four questions: What are the effects of reducing class size? How important are these effects? How can we explain these effects? and How can we improve the outcomes when class sizes are reduced? A major aim is to provide directions for resolving the paradox as to “Why reducing class size has not led to major improvements in student learning?” and the conclusion is that class size reductions can lead to worthwhile increases provided certain conditions are met.  相似文献   

20.
How should we think about the concept of the testlet? How can testlets be better incorporated into test score analysis? Can there be a one‐item testlet?  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号