首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the resulting judgments. This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores. The results showed substantially different cut scores for the two conditions; this calls into question whether content experts can produce the type of internally consistent judgments that are required using the bookmark procedure.  相似文献   

This article uses data from a large‐scale assessment program to illustrate the potential issue of range restriction with the Bookmark method in the context of trying to set cut scores to closely align with a set of college and career readiness benchmarks. Analyses indicated that range restriction issues existed across different response probability (RP) values and item response theory (IRT) models if one were to apply the Bookmark procedure using intact test forms. Results also suggested that range restriction may still be present if one had access to additional data from an item bank. This demonstration critically highlights challenges that may exist in some practical applications of the Bookmark method due items not being designed to cover the full range of examinee abilities.  相似文献   

This article presents a comparison of simplified variations on two prevalent methods, Angoff and Bookmark, for setting cut scores on educational assessments. The comparison is presented through an application with a Grade 7 Mathematics Assessment in a midwestem school district. Training and operational methods and procedures for each method are described in detail along with comparative results for the application. An alternative item ordering strategy for the Bookmark method that may increase its usability is also introduced. Although the Angoff method is more widely used, the Bookmark method has some promising features, specifically in educational settings. Teachers are able to focus on the expected performance of the "barely proficient" student without the additional challenge of estimating absolute item dificulty.  相似文献   

Essential for the validity of the judgments in a standard-setting study is that they follow the implicit task assumptions. In the Angoff method, judgments are assumed to be inversely related to the difficulty of the items; contrasting-groups judgments are assumed to be positively related to the ability of the students. In the present study, judgments from both procedures were modeled with a random-effects probit regression model. The Angoff judgments showed a weaker link with the position of the items on the latent scale than the contrasting-groups judgments with the position of the students. Hence, in the specific context of the study, the contrasting-groups judgments were more aligned with the underlying assumptions of the method than the Angoff judgments .  相似文献   

In this article we address the issue of consistency in standard setting in the context of an augmented state testing program. Information gained from the external NRT scores is used to help make an informed decision on the determination of cut scores on the state test. The consistency of cut scores on the CRT across grades is maintained by forcing a consistency model based on the NRT scores and translating that information back to the CRT scores. The inconsistency of standards and the application of this model are illustrated using data from the Maryland MSA large state testing program involving cut points for basic, proficient and advanced in mathematics and reading across years and across grades. The model is discussed in some detail and shown to be a promising approach, although not without assumptions that must be made and issues that might be raised.  相似文献   

Evidence to support the credibility of standard setting procedures is a critical part of the validity argument for decisions made based on tests that are used for classification. One area in which there has been limited empirical study is the impact of standard setting judge selection on the resulting cut score. One important issue related to judge selection is whether the extent of judges’ content knowledge impacts their perceptions of the probability that a minimally proficient examinee will answer the item correctly. The present article reports on two studies conducted in the context of Angoff‐style standard setting for medical licensing examinations. In the first study, content experts answered and subsequently provided Angoff judgments for a set of test items. After accounting for perceived item difficulty and judge stringency, answering the item correctly accounted for a significant (and potentially important) impact on expert judgment. The second study examined whether providing the correct answer to the judges would result in a similar effect to that associated with knowing the correct answer. The results suggested that providing the correct answer did not impact judgments. These results have important implications for the validity of standard setting outcomes in general and on judge recruitment specifically.  相似文献   

我国基于标准的教育考试的分类标准比较混乱,存在诸多争议。Bookmark法最早是由Mitzel等在2001年进行系统描述的基于项目反应理论的设置标准等第划界分数的方法,近年来在国际上得到越来越广泛的应用。文中先介绍Bookmark法标准设置的基本原理,以及执行Bookmark法的基本程序。然后,以高等教育统考课程《高等数学》为例,在考后利用Bookmark法进行标准设置,确定优、良、合格和不合格四个等第的划界分数。  相似文献   

Since 1971 there have been a number of studies in which a cut score has been set using a method proposed by Angoff (1971). In this method, each member of a panel of judges estimates for each test question the proportion correct for a specific target group of examinees. Prior and contemporary research suggests that this is a difficult task for judges. Angoff also proposed that judges simply indicate whether or not an examinee from the target group will be able to answer each question correctly (the yes/no method). We report on the results of two studies that compare a yes/no estimation with a proportion correct estimation. The two studies demonstrate that both methods produce essentially equal cut scores and that judges find the yes/no method more comfortable to use than the estimated proportion correct method.  相似文献   

在翻译能力结构分析和培养策略探索的基础上,采用翻译测试和问卷调查两种实证研究的方式,着重考察了在多媒体网络环境下,“以学习者为中心”的“过程教学法”对学生翻译能力的影响。通过对实验组和控制组进行一个学期的培训后,数据分析结果表明,此教学法对提高学生的翻译能力具有明显成效。  相似文献   

文章通过个案研究,运用实验语音学的方法分析了美国学生汉语单音节声调中的时长、调域、调型和调值偏误;总结了其偏误声调时长较短、调域较窄以及阳平调和上声调易出现偏误的特征;分析和解释了造成其声调偏误的母语音调习惯干扰、发音机制制约、教材与教学策略影响、学习策略影响等原因;给出了相应的教学原则和教学策略。  相似文献   

通过个案研究,借助实验语音学的分析手段,对不同汉语水平的泰国留学生单字音声调中的调长、调域、调型和调值进行了声学分析,总结了泰国学生习得汉语的声调偏误主要是调域上的偏误。泰国学生汉语单字音声调偏误最严重的是一声和四声,具体表现是一声调不够高。四声调降太长。  相似文献   

Use of the Rasch IRT Model in Standard Setting: An Item-Mapping Method   总被引:1,自引:0,他引:1  
This article provides both logical and empirical evidence to justify the use of an item-mapping method for establishing passing scores for multiple-choice licensure and certification examinations. After describing the item-mapping standard-setting process, the rationale and theoretical basis for this method are discussed, and the similarities and differences between the item-mapping and the Bookmark methods are also provided. Empirical evidence supporting use of the item-mapping method is provided by comparing results from four standard-setting studies for diverse licensure and certification examinations. The four cut score studies were conducted using both the item-mapping and the Angoff methods. Rating data from the four standard-setting studies, using each of the two methods, were analyzed using item-by-rater random effects generalizability and dependability studies to examine which method yielded higher inter-judge consistency. Results indicated that the item-mapping method produced higher inter-judge consistency and achieved greater rater agreement than the Angoff method.  相似文献   

章对来自新疆南疆的维吾尔族(以下简称维族)学生感知汉语普通话元音进行了实验分析。实验材料选择七个汉语元音/i,y,u,ε,δ,o,α/。实验结果显示维族学生不仅在反应速度上比控制组的汉族学生慢,而且常混淆/i,ε,δ/三个元音。研究结果表明,维吾尔语和汉语的元音音位相似,但语音特征不同。该实验研究为对比语言类型和二语习得研究提供科学依据。  相似文献   

Judgmental standard-setting methods, such as the Angoff(1971) method, use item performance estimates as the basis for determining the minimum passing score (MPS). Therefore, the accuracy, of these item peformance estimates is crucial to the validity of the resulting MPS. Recent researchers (Shepard, 1995; Impara & Plake, 1998; National Research Council. 1999) have called into question the ability of judges to make accurate item performance estimates for target subgroups of candidates, such as minimally competent candidates. The propose of this study was to examine the intra- and inter-rater consistency of item performance estimates from an Angoff standard setting. Results provide evidence that item pelformance estimates were consistent within and across panels within and across years. Factors that might have influenced this high degree of reliability, in the item performance estimates in a standard setting study are discussed.  相似文献   

In this study, a variation of the bookmark standard setting procedure for passage-based tests is proposed in which separate ordered item booklets are created for the items associated with each passage. This variation is compared to the traditional bookmark procedure for a fifth-grade reading test. The results showed that the single-passage bookmark method produced greater consistency among the participants' cutscores, and most participants' bookmark placements did not change after the first round. In addition, participants reported greater understanding of the bookmarking task and greater confidence in their recommended cutscores. Both procedures required approximately the same amount of time to complete, but it is likely that the single-passage bookmark method could be carried out in two, or possibly even one, round of bookmarking rather than the three rounds used in traditional bookmarking. On the other hand, there are several concerns about the single-passage bookmark method that warrant further research. These include floor and ceiling effects, training issues, optimal booklet length, and multiple standards.  相似文献   

高效液相色谱法是天然化合物定性定量的重要方法之一,高效液相色谱仪也是一类贵重仪器.介绍了高效液相色谱仪的分离原理、特点及其定性定量的方法,样品前处理方法;同时在实验教学中,建立HPLC内标法测定茶叶中咖啡因的含量.通过本实验学生不仅能掌握样品前处理方法、还能更好地理解和掌握HPLC的分离原理、操作方法、定性、定量方法及其在天然化合物分析研究中的重要性;通过实验教学中每人一份样品,提高了学生的学习兴趣,增强了其动手能力;此外该实验还特别培养了学生对贵重仪器的维护与保养意识.  相似文献   

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard‐setting panels should have the proper qualifications to make the judgments asked of them; however, even qualified judges vary in expertise and in some cases, such as highly specialized areas or when members of the public are involved, it may be difficult to ensure that each member of a standard‐setting panel has the requisite expertise to make qualified judgments. Given the subjective nature of these types of judgments, and that a large part of the validity argument for an exam lies in the robustness of its passing standard, an examination of the influence of judge proficiency on the judgments is warranted. This study explores the use of the many‐facet Rasch model as a method for adjusting modified Angoff standard‐setting ratings based on judges’ proficiency levels. The results suggest differences in the severity and quality of standard‐setting judgments across levels of judge proficiency, such that judges who answered easy items incorrectly tended to perceive them as easier, but those who answered correctly tended to provide ratings within normal stochastic limits.  相似文献   

Standard setting is defined as the identification of certain points on a mark scale with particular performance standards, with the intention of enhancing the inferences that are warranted from the test scores. It is argued that the selection of both the points on the mark‐scales and the performance standards with which they are equated are arbitrary and are driven by a set of values (which are often implicit). In ‘high‐stakes’ settings, it is shown how the values implicit in the standard can come to dominate the values inherent in the domain they represent. The validation of standards must therefore include consideration of their consequences as well as their meanings. It is then argued that standards, where they exist, cannot be accounted for purely in terms of norm‐referenced or criterion‐referenced interpretations, but exist rather by virtue of a shared construct in a community of practice. These theoretical positions are then developed to classify standard‐setting methods along two dimensions, the first relating to the role of performance data in the setting of standards and the second relating to the extent to which the meanings or the consequences of the assessment are emphasised in the process.  相似文献   

美国印第安作家莫马戴的代表作《黎明之屋》,在印第安传统文化的基础上构建出对自然及人与自然关系的哲学思考。文章通过分析小说中自然环境的描写和主人公的遭遇,探讨莫马戴在小说中表现出的生态主义思想。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号