首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One commonly used compromise standard-setting method is the Beuk (1984) method. A key assumption of the Beuk method is that the emphasis given to the pass rate and the percent correct ratings should be proportional to the extent that the panelists agree on their ratings. However, whether the slope of Beuk line reflects the emphasis that panelists believe should be assigned to the pass rate and the percentage correct ratings has not be fully tested. In this article, I evaluate this critical assumption of the Beuk method by asking panelists to assign importance weights to their percentage correct and pass rate judgments. I show that in several cases that the emphasis suggested by the Beuk slope is noticeably different from what one would expect and is inconsistent with importance weight ratings. I also suggest two ways that the importance weights can be used to calculate alternate cut scores, and I show that one of the ways of calculating cut scores using the importance weights leads to larger potential differences in cut score estimates. I suggest that practitioners should consider collecting importance weights when the Beuk method is used for determining cut scores.  相似文献   

2.
《教育实用测度》2013,26(2):121-141
The borderline-group method and the contrasting-groups method were compared with Nedelsky's method at four schools, with Angoff's method at another four schools, and with each other at all eight schools, using tests of basic skills in reading and mathematics. The borderline-group and contrastinggroups methods produced similar results when approximately equal numbers of students were classified as masters and nonmasters. The contrasting-groups passing score was lower than the borderline-group passing score when masters greatly outnumbered nonmasters and higher when nonmasters outnumbered masters. Results involving the Nedelsky and Angoff methods were not consistent across schools. Passing scores tended to be higher at schools where students were more able.  相似文献   

3.
Cutoff scores based on absolute standards can be unacceptable in terms of the number of failures they produce. Cutoff scores based on relative standards, that is, cutoff scores set to achieve a fixed percentage of failures, can be unacceptable because an acceptable performance level for passed examinees cannot be guaranteed. In some situations one can improve upon an absolute standard using a compromise model, which draws on the information in the observed score distribution for a test to adjust the standard. Three compromise models are described and compared in this article.  相似文献   

4.
当前高校结课考试中的作弊现象十分普遍,本文从一种新角度对该问题进行分析。首先将考试分为选拔性考试和通过性考试,将学生分为绩优学生和绩差学生。在分析不同考试背景下两类学生作弊心理基础上,提出重点监考绩差学生,采用开卷考试和提高试卷质量,监考教师认真监考,作弊处罚落到实处共4项应对考试作弊的对策。  相似文献   

5.
This study investigated the effectiveness of equating with very small samples using the random groups design. Of particular interest was equating accuracy at specific scores where performance standards might be set. Two sets of simulations were carried out, one in which the two forms were identical and one in which they differed by a tenth of a standard deviation in overall difficulty. These forms were equated using mean equating, linear equating, unsmoothed equipercentile equating, and equipercentile equating using two through six moments of log-linear presmoothing with samples of 25, 50, 75, 100, 150, and 200. The results indicated that identity equating was preferable to any equating method when samples were as small as 25. For samples of 50 and above, the choice of an equating method over identity equating depended on the location of the passing score relative to examinee performance. If passing scores were located below the mean, where data were sparser, mean equating produced the smallest percentage of misclassified examinees. For passing scores near the mean, all methods produced similar results with linear equating being the most accurate. For passing scores above the mean, equipercentile equating with 2- and 3-moment presmoothing were the best equating methods. Higher levels of presmoothing did not improve the results.  相似文献   

6.
In judgmental standard setting procedures (e.g., the Angoff procedure), expert raters establish minimum pass levels (MPLs) for test items, and these MPLs are then combined to generate a passing score for the test. As suggested by Van der Linden (1982), item response theory (IRT) models may be useful in analyzing the results of judgmental standard setting studies. This paper examines three issues relevant to the use of lRT models in analyzing the results of such studies. First, a statistic for examining the fit of MPLs, based on judges' ratings, to an IRT model is suggested. Second, three methods for setting the passing score on a test based on item MPLs are analyzed; these analyses, based on theoretical models rather than empirical comparisons among the three methods, suggest that the traditional approach (i.e., setting the passing score on the test equal to the sum of the item MPLs) does not provide the best results. Third, a simple procedure, based on generalizability theory, for examining the sources of error in estimates of the passing score is discussed.  相似文献   

7.
In this note, we demonstrate an interesting use of the posterior distributions (and corresponding posterior samples of proficiency) that are yielded by fitting a fully Bayesian test scoring model to a complex assessment. Specifically, we examine the efficacy of the test in combination with the specific passing score that was chosen through expert judgment, or, in general, any external a priori criterion. In addition, we study the robustness of the test's efficacy with respect to choice of the passing score.  相似文献   

8.
ABSTRACT

The authors investigated the extent to which taking specific types of Advanced Placement (AP) courses and the number of courses taken predicts the likelihood of passing subject benchmarks and earning a score of 19 on the composite score on the ACT test, and examined the role gender plays in the projection. They found evidence that taking an AP mathematics course and taking more AP courses derives a positive benefit. Results suggest young men are more likely to succeed in passing ACT mathematics and ACT science tests than are young women, but no gender difference was found on ACT Reading and ACT social studies.  相似文献   

9.
The continuous testing framework, where both successful and unsuccessful examinees have to demonstrate continued proficiency at frequent prespecified intervals, is a framework that is used in noncognitive assessment and is gaining in popularity in cognitive assessment. Despite the rigorous advantages of this framework, this paper demonstrates that there is significant inflation in false negatives as both passers and failers continually take a test, especially for examinees closer to the passing score. Several passing policies are investigated to control the inflation of false negatives while maintaining low false‐positive rates for fixed‐length tests. Lastly, recommendations are made for testing professionals who wish to utilize the rigorous nature of the continuous testing framework while also avoiding the inflation of qualified examinees failing.  相似文献   

10.
The purpose of this study was to compare several methods for determining a passing score on an examination from the individual raters' estimates of minimal pass levels for the items. The methods investigated differ in the weighting that the estimates for each item receive in the aggregation process. An IRT-based simulation method was used to model a variety of error components of minimum pass levels. The results indicate little difference in estimated passing scores across the three methods. Less error was present when the ability level of the minimally competent candidates matched the expected difficulty level of the test. No meaningful improvement in passing score estimation was achieved for a 50-item test as opposed to a 25-item test; however, the RMSE values for estimates with 10 raters were smaller than those for 5 raters. The results suggest that the simplest method for aggregating minimum pass levels across the items in a test–adding them up–is the preferred method.  相似文献   

11.
本研究随机抽取了21387名参加某年医师资格考试医学综合笔试临床执业类别考试考生,对其在外科学上的得分进行了聚类分析,并将按照边界组法以及对照组法计算出的边界分数与Angoff专家判断的合格分数进行了对比。结果表明:两者对考生分类的一致性Kappa系数高达0.934,充分证明了Angoff合格分数判断法的有效性。  相似文献   

12.
This study utilized multiple regression analysis to develop equations to predict a committee's decision on whether to readmit flunked-out college students and to develop a second equation to predict grade point average the first quarter after readmitting such students. Data for similar students for a second academic year provided a hold-out group to cross-validate the regression equations. The committee's decision to readmit students could be predicted fairly well (cross-validity = .61) from the variables of setting realistic goals, math test score, number of quality points short of a passing average, and a self-analysis of failure. The attempt to predict the grade point average the first quarter after readmission was much less successful (cross-validity = .32). A different set of student factors seemed to be influential in accounting for a committee's decision to readmit a student and for the student's subsequent grade performance if admitted  相似文献   

13.
在分析我国体育教学理论中的教学方法模式基础上 ,结合篮球运动规律、特点设计了个案教学法 ,并在篮球传球、运球、投篮、持球突破技术教学中进行教学实验 ,实验结果表明 :实验组成绩优于对照组且有显著性差异 ,能促使学生加快加深对合理、规范动作的理解和巩固 ;能激发学生学习的主动性、积极性 ;能有效培养学生争上的团队精神和自信心 ;有利于形成新的教学观 .  相似文献   

14.
We developed a criterion-referenced student rating of instruction (SRI) to facilitate formative assessment of teaching. It involves four dimensions of teaching quality that are grounded in current instructional design principles: Organization and structure, Assessment and feedback, Personal interactions, and Academic rigor. Using item response theory and Wright mapping methods, we describe teaching characteristics at various points along the latent continuum for each scale. These maps enable criterion-referenced score interpretation by making an explicit connection between test performance and the theoretical framework. We explain the way our Wright maps can be used to enhance an instructor’s ability to interpret scores and identify ways to refine teaching. Although our work is aimed at improving score interpretation, a criterion-referenced test is not immune to factors that may bias test scores. The literature on SRIs is filled with research on factors unrelated to teaching that may bias scores. Therefore, we also used multilevel models to evaluate the extent to which student and course characteristic may affect scores and compromise score interpretation. Results indicated that student anger and the interaction between student gender and instructor gender are significant effects that account for a small amount of variance in SRI scores. All things considered, our criterion-referenced approach to SRIs is a viable way to describe teaching quality and help instructors refine pedagogy and facilitate course development.  相似文献   

15.
Angoff-based standard setting is widely used, especially for high-stakes licensure assessments. Nonetheless, some critics have claimed that the judgment task is too cognitively complex for panelists, whereas others have explicitly challenged the consistency in (replicability of) standard-setting outcomes. Evidence of consistency in item judgments and passing scores is necessary to justify using the passing scores for consequential decisions. Few studies, however, have directly evaluated consistency across different standard-setting panels. The purpose of this study was to investigate consistency of Angoff-based standard-setting judgments and passing scores across 9 different educator licensure assessments. Two independent, multistate panels of educators were formed to recommend the passing score for each assessment, with each panel engaging in 2 rounds of judgments. Multiple measures of consistency were applied to each round of judgments. The results provide positive evidence of the consistency in judgments and passing scores.  相似文献   

16.
天津市初等信息技术考试是面向社会测试应试者计算机应用能力的评测系统,作为一种标准参照考试,从2004年开始实施以来,一直以60分作为合格标准,但实践证明,60分并不能作为判断考生是否合格的永恒标准。该考试系统是上机考试,社会考生自愿报名参加,考试对象年龄差异较大,覆盖小学2-6年级,且每个级别会有不同年龄学生参加,60分的划界分数忽略了每次参加测试的被试者的平均能力不同这一事实,也忽略了同一次考试不同考生抽到的题目不完全一致的事实。这样可能会产生一个问题,即我们只能了解考生的相对能力和相对位置。如果不能正确地将考生归入恰当的等级类别中,这种等级考试的价值就会受很大影响。因此,本文对该考试系统的"合格"标准分数的设定进行研究,利用Angoff法设定划界分数,客观地应用到被试群体中,在提高考试信度、效度的研究与应用方面进行了有益的探索。  相似文献   

17.
《教育实用测度》2013,26(3):231-244
For any testing program intended for licensure, certification, competency, or proficiency, the estimation of content relevant test scores for pass/fail decision making is necessary. This study compares number-correct scoring to empirical option weighting in the context of such tests. The study was conducted under two test design conditions, three test length conditions, and four passing score levels. Two criteria were used to evaluate the effectiveness of empirical option weighting versus number-correct scoring. Empirical option weighting typically produced slightly more reliable domain score estimates and more consistent pass/fail decisions than number-correct scoring, particularly in the lower half of the test score distribution. For many types of testing programs where the passing scores are established in the lower half of the test score distribution, the empirical option weighting method used in this study seems both appropriate and effective in improving the depend- ability of test scores and the consistency of pass/fail decisions. Test users, however, must weigh the effort required to use option weighting against the small gains obtained with this method. Other problems are discussed that may limit the usefulness of option weighting.  相似文献   

18.
《教育实用测度》2013,26(3):203-205
Many credentialing agencies today are either administering their examinations by computer or are likely to be doing so in the coming years. Unfortunately, although several promising computer-based test designs are available, little is known about how well they function in examination settings. The goal of this study was to compare fixed-length examinations (both operational forms and newly constructed forms) with several variations of multistage test designs for making pass-fail decisions. Results were produced for 3 passing scores. Four operational 60-item examinations were compared to (a) 3 new 60-item forms, (b) 60-item 3-stage tests, and (c) 40-item 2-stage tests; all were constructed using automated test assembly software. The study was carried out using computer simulation techniques that were set to mimic common examination practices. All 60-item tests, regardless of design or passing score, produced accurate ability estimates and acceptable and similar levels of decision consistency and decision accuracy. One interesting finding was that the 40-item test results were poorer than the 60-item test results, as expected, but were in the range of acceptability. This raises the practical policy question of whether content-valid 40-item tests with lower item exposure levels and/or savings in item development costs are an acceptable trade-off for a small loss in decision accuracy and consistency.  相似文献   

19.
以文秋芳制作的学习动机量表和两次四级成绩为测量工具,通过问卷调查、访谈的方式,研究了学习动机对公外学生英语水平磨蚀的影响。配对样本T检验发现:全体受试对象、高分组受试对象英语水平有显著提高,低分组有显著磨蚀。高、低分组大多数表层动机强度相似,但努力程度和元认知策略影响他们英语水平是否磨蚀。低分组学生由于对英语学习持悲观态度且认为英语学习没有价值,在通过四级后不会继续学习,加之不会将英语作为交流工具及用于学术研究而造成深层动机强度较低,从而磨蚀英语水平。  相似文献   

20.
Brennan ( 2012 ) noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman ( 2008 ) suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. According to this method, a subscore has added value if the corresponding true subscore is predicted better by the subscore than by the total score. In this note, parallel‐forms scores are considered. It is proved that another way to interpret the method of Haberman is that a subscore has added value if it is in better agreement than the total score with the corresponding subscore on a parallel form. The suggested interpretation promises to make the method of Haberman more accessible because several practitioners find the concept of parallel forms more acceptable or easier to understand than that of a true score. Results are shown for data from two operational tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号