首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 13 毫秒
This research evaluated the impact of a common modification to Angoff standard‐setting exercises: the provision of examinee performance data. Data from 18 independent standard‐setting panels across three different medical licensing examinations were examined to investigate whether and how the provision of performance information impacted judgments and the resulting cut scores. Results varied by panel but in general indicated that both the variability among the panelists and the resulting cut scores were affected by the data. After the review of performance data, panelist variability generally decreased. In addition, for all panels and examinations pre‐ and post‐data cut scores were significantly different. Investigation of the practical significance of the findings indicated that nontrivial fail rate changes were associated with the cut score changes for a majority of standard‐setting exercises. This study is the first to provide a large‐scale, systematic evaluation of the impact of a common standard setting practice, and the results can provide practitioners with insight into how the practice influences panelist variability and resulting cut scores.  相似文献   

Despite being widely used and frequently studied, the Angoff standard setting procedure has received little attention with respect to an integral part of the process: how judges incorporate examinee performance data in the decision‐making process. Without performance data, subject matter experts have considerable difficulty accurately making the required judgments. Providing data introduces the very real possibility that judges will turn their content‐based judgments into norm‐referenced judgments. This article reports on three Angoff standard setting panels for which some items were randomly assigned to have incorrect performance data. Judges were informed that some of the items were accompanied by inaccurate data, but were not told which items they were. The purpose of the manipulation was to assess the extent to which changing the instructions given to the judges would impact the extent to which they relied on the performance data. The modified instructions resulted in the judges making less use of the performance data than judges participating in recent parallel studies. The relative extent of the change judges made did not appear to be substantially influenced by the accuracy of the data.  相似文献   

The purpose of this article was to model United States Medical Licensing Examination (USMLE) Step 2 passing rates using the Cox Proportional Hazards Model, best known for its application in analyzing clinical trial data. The number of months it took to pass the computer-based Step 2 examination was treated as the dependent variable in the model. Covariates in the model were: (a) medical school location (U.S. and Canadian or other), (b) primary language (English or other), and (c) gender. Preliminary findings indicate that examinees were nearly 2.7 times more likely to experience the event (pass Step 2) if they were U.S. or Canadian trained. Examinees with English as their primary language were 2.1 times more likely to pass Step 2, but gender had little impact. These findings are discussed more fully in light of past research and broader potential applications of survival analysis in educational measurement.  相似文献   

Evidence of stable standard setting results over panels or occasions is an important part of the validity argument for an established cut score. Unfortunately, due to the high cost of convening multiple panels of content experts, standards often are based on the recommendation from a single panel of judges. This approach implicitly assumes that the variability across panels will be modest, but little evidence is available to support this assertion. This article examines the stability of Angoff standard setting results across panels. Data were collected for six independent standard setting exercises, with three panels participating in each exercise. The results show that although in some cases the panel effect is negligible, for four of the six data sets the panel facet represented a large portion of the overall error variance. Ignoring the often hidden panel/occasion facet can result in artificially optimistic estimates of the cut score stability. Results based on a single panel should not be viewed as a reasonable estimate of the results that would be found over multiple panels. Instead, the variability seen in a single panel can best be viewed as a lower bound of the expected variability when the exercise is replicated.  相似文献   

This article provides an overview of the Hofstee standard‐setting method and illustrates several situations where the Hofstee method will produce undefined cut scores. The situations where the cut scores will be undefined involve cases where the line segment derived from the Hofstee ratings does not intersect the score distribution curve based on actual exam performance data. Data from 15 standard settings performed by a credentialing organization are used to investigate how common undefined cut scores are with the Hofstee method and to compare cut scores derived from the Hofstee method with those from the Beuk method. Results suggest that when Hofstee cut scores exist that the Hofstee and Beuk methods often yield fairly similar results. However, it is shown that undefined Hofstee cut scores did occur in a few situations. When Hofstee cut scores are undefined, it is suggested that one extend the Hofstee line segment so that it intersects the score distribution curve to estimate cut scores. Analyses show that extending the line segment to estimate cut scores often yields similar results to the Beuk method. The article concludes with a discussion of what these results may imply for people who want to employ the Hofstee method.  相似文献   

Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination. The results indicate that these observational metrics are useful but provide an imprecise evaluation of the impact of time constraints on test performance.  相似文献   

To explore a phenomenon of gender differences in Advanced Placement examinations, random samples of free-response test booklets were taken from the 1986 examination in U.S. History. These examinations were chosen because they consistently show significant gender differences in objective scores but no gender differences in free-response scores. A rescoring of the free responses was conducted that focused on their historical content. This rescoring was conducted by readers other than those who conducted the original scoring and involved tallies of specific historical points made, supporting evidence given, and factual errors. Ratings were also made of handwriting quality, neatness, and English composition quality of the free responses. Analyses conducted indicate that free-response tasks of the type examined may have inherent characteristics that reward English composition abilities, and that some females may compensate for inferior historical knowledge with superior English composition abilities.  相似文献   

Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the resulting judgments. This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores. The results showed substantially different cut scores for the two conditions; this calls into question whether content experts can produce the type of internally consistent judgments that are required using the bookmark procedure.  相似文献   

In architectural design education, the main objective is to help students, especially first‐year students, improve their design ideas, creativity, perception of three dimensions and ways of expressing them. Thus, as an embedded concept in architecture, art has been emphasized here as a design method. In other words, the necessary help to enable students to think more freely has been provided by ceramic art. The concept dealt with in this article is an interdisciplinary approach to space design as an experimental method in design education. Just as fine art students are inspired from the principles of architecture, clay, as basic material to fine art students, makes a creative material and design tool for architecture students. A workshop was organised in the design courses by the first author, the instructor, for the first‐ and third‐year architecture students. The second author, a ceramics artist and lecturer, has participated in the workshop as a visiting instructor and contributed with her own studies related to space, house, building and materials.  相似文献   

This study investigated how teachers used an electronic performance support system (EPSS) and whether the usage of this EPSS affected their work performance and attitudes toward computer technology. The findings suggested a framework for the implementation of an EPPS in an educational setting, specifically at a middle school. The data were collected through observations, questionnaires, anecdotal logs, database records, and interviews. Four middle school teachers used the EPSS primarily for completing student progress reports wherein the results indicated that the EPSS decreased the amount of time to perform this task. Computer usage, performance, and attitudes were affected by work responsibilities, accessibility to computers, the change agent, the technology support personnel, as well as the specific characteristics of the EPSS. The teachers' attitudes toward the EPSS and technology in general were affected by their performances when using the system, by interactions with the person responsible for technology support, and by the ability to customize the computer program to fit their needs.  相似文献   

依据2018年高考(天津卷)数学(文史类)试卷实测数据,通过研究"一体四层四翼"高考评价体系在试卷中的体现来分析试题特色,并参照《考生水平表现标准(数学)》,对2018年天津高考文科考生数学核心素养的水平表现及反映出的教学问题进行评价和分析,提出有针对性的教学建议。  相似文献   

为了研究大学生体育锻炼行为,真实了解大学生的锻炼行为是如何形成的,此研究采用行为干预理论,综合运用文献资料法、实验法、数理统计法、访谈法等对处在不同阶段的401名大二学生的体育锻炼行为进行干预,并在此基础上提出了相应的建议,旨在为高校探索健康行为教育的方法。  相似文献   

The accelerating diffusion of broadband Internet access provides many opportunities for the development of pedagogically robust Web‐based instruction (WBI). While the supporting technology infrastructure of broadband disseminates, the attention of academic researchers focuses upon issues such as the drivers of student usage of WBI. Specifically, the research presented herein examined the impact of WBI on a student's aggregate course performance. We hypothesized that learning independence (LI) is a determinate factor in a student's use of WBI. In this study, we employed structural equation modeling techniques to examine the data and assess the direct and indirect effects of LI on WBI usage. The subjects, students in an introductory Computer Information Systems applications course, used a Web‐based tutorial program for skills instruction. The findings of this study suggest that WBI usage has a significant impact on a student's course performance. Despite its plausibility, the effect of LI on WBI usage was not significant. However, we did conclude that two of the second order factors of the LI construct have a direct effect on a student's performance in the course.  相似文献   

In this study, we compared methods to improve the decoding and reading fluency of struggling readers. Second‐grade poor readers were randomly assigned to one of the two practice conditions within a repeated reading intervention. Both interventions were in small groups, were 20–28 min long, took place 2–4 days per week, and consisted of phonemic awareness training, letter sound practice, and practice in word families. Students in the accuracy condition (n= 27) practiced each page until they reached 98 percent accuracy while students in the accuracy + automaticity condition (n= 29) practiced until they reached rate (30–90 cwpm) and accuracy criteria. Hierarchical linear modeling revealed no differences between practice conditions in decoding accuracy, reading comprehension, and grade‐level text reading fluency. Significant differences favoring the accuracy + automaticity group were found in measures of decoding automaticity.  相似文献   

高中女生数学考试归因训练的实验研究   总被引:1,自引:0,他引:1  
采用问卷和实验相结合的方法,对高中女生数学考试不适当的归因方式进行了归因训练的实验研究.研究发现:(1)在原因归因上,与控制组相比,实验组被试更倾向于将成功归因于能力和持久努力等内部原因,而将失败归因于心境、持久努力、运气;而且,成功情境下临时努力归因显著下降,失败情境下能力和教学质量归因显著下降.(2)在期望倾向上,实验组认为失败的结果是可以改变的,并且更愿意付出努力.(3)在情感反应倾向上,实验组在成功后的自豪和欣慰感更为强烈,失败后更多地产生内疚的情绪体验,而控制组在失败后更易产生自卑的消极情感.  相似文献   

在翻译能力结构分析和培养策略探索的基础上,采用翻译测试和问卷调查两种实证研究的方式,着重考察了在多媒体网络环境下,“以学习者为中心”的“过程教学法”对学生翻译能力的影响。通过对实验组和控制组进行一个学期的培训后,数据分析结果表明,此教学法对提高学生的翻译能力具有明显成效。  相似文献   

Validating performance standards is challenging and complex. Because of the difficulties associated with collecting evidence related to external criteria, validity arguments rely heavily on evidence related to internal criteria—especially evidence that expert judgments are internally consistent. Given its importance, it is somewhat surprising that evidence of this kind has rarely been published in the context of the widely used bookmark standard‐setting procedure. In this article we examined the effect of ordered item booklet difficulty on content experts’ bookmark judgments. If panelists make internally consistent judgments, their resultant cut scores should be unaffected by the difficulty of their respective booklets. This internal consistency was not observed: the results suggest that substantial systematic differences in the resultant cut scores can arise when the difficulty of the ordered item booklets varies. These findings raise questions about the ability of content experts to make the judgments required by the bookmark procedure.  相似文献   

Use of the Rasch IRT Model in Standard Setting: An Item-Mapping Method   总被引:1,自引:0,他引:1  
This article provides both logical and empirical evidence to justify the use of an item-mapping method for establishing passing scores for multiple-choice licensure and certification examinations. After describing the item-mapping standard-setting process, the rationale and theoretical basis for this method are discussed, and the similarities and differences between the item-mapping and the Bookmark methods are also provided. Empirical evidence supporting use of the item-mapping method is provided by comparing results from four standard-setting studies for diverse licensure and certification examinations. The four cut score studies were conducted using both the item-mapping and the Angoff methods. Rating data from the four standard-setting studies, using each of the two methods, were analyzed using item-by-rater random effects generalizability and dependability studies to examine which method yielded higher inter-judge consistency. Results indicated that the item-mapping method produced higher inter-judge consistency and achieved greater rater agreement than the Angoff method.  相似文献   

远红外瓷珠热敷对一次性力竭运动疲劳消除的实验研究   总被引:1,自引:0,他引:1  
通过观察远红外瓷珠热敷对一次性力竭运动后血乳酸的变化,探讨其对运动性疲劳消除的作用以及促进机体恢复的可行性.通过对20名运动员一次性力竭运动后血乳酸的测试结果表明:远红外瓷珠热敷能使一次性力竭运动后血乳酸明显下降,有利于消除疲劳,改善运动员的机能状态,提高运动能力.  相似文献   

Cadaveric dissection offers an important opportunity for students to develop their ideas about death and dying. However, it remains largely unknown how this experience impacts medical students' fear of death. The current study aimed to address this gap by describing how fear of death changed during a medical gross anatomy dissection course and how fear of death was associated with examination performance. Fear of death was surveyed at the beginning of the course and at each of the four block examinations using three of the eight subscales from the Multidimensional Fear of Death Scale: Fear of the Dead, Fear of Being Destroyed, and Fear for the Body After Death. One hundred forty-three of 165 medical students (86.7%) completed the initial survey. Repeated measures ANOVA showed no significant changes in Fear of the Dead (F (4, 108) = 1.45, P = 0.222) or Fear for the Body After Death (F (4, 108) = 1.83, P = 0.129). There was a significant increase in students' Fear of Being Destroyed (F (4, 108) = 6.86, P < 0.0005) after beginning dissection. This increase was primarily related to students' decreased willingness to donate their body. Concerning performance, there was one significant correlation between Fear for the Body After Death and the laboratory examination score at examination 1. Students with higher fears may be able to structure their experience in a way that does not negatively impact their performance, but educators should still seek ways to support these students and encourage body donation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号