首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
When good model-data fit is observed, the Many-Facet Rasch (MFR) model acts as a linking and equating model that can be used to estimate student achievement, item difficulties, and rater severity on the same linear continuum. Given sufficient connectivity among the facets, the MFR model provides estimates of student achievement that are equated to control for differences in rater severity. Although several different linking designs are used in practice to establish connectivity, the implications of design differences have not been fully explored. Research is also limited related to the impact of model-data fit on the quality of MFR model-based adjustments for rater severity. This study explores the effects of linking designs and model-data fit for raters on the interpretation of student achievement estimates within the context of performance assessments in music. Results indicate that performances cannot be effectively adjusted for rater effects when inadequate linking or model-data fit is present.  相似文献   

Many large‐scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple‐choice and/or constructed responses sections of items to generate multiple scores. In the current article, we propose an extension of the hierarchical rater model (HRM) to be applied with simple structured tests with constructed response items. In addition to modeling the appropriate trait structure, the multidimensional HRM (M‐HRM) presented here also accounts for rater severity bias and rater variability or inconsistency. We introduce the model formulation, test parameter recovery with a focus on latent traits, and compare the M‐HRM to other scoring approaches (unidimensional HRMs and a traditional multidimensional item response theory model) using simulated and empirical data. Results show more precise scores under the M‐HRM, with a major improvement in scores when incorporating rater effects versus ignoring them in the traditional multidimensional item response theory model.  相似文献   

评分者漂移是指评分员跨时间、场合或任务的行为改变,即评分者效应的波动。该构念的提出反映了研究者对评分者效应的兴趣由静态转为动态。在高利害教育考试的背景下,对评分者漂移进行检测是保障结果分数的信度、效度和考试公平性的必然要求。目前,对评分者漂移的检测主要采取基于多面Rasch模型和差异检验的传统方法。评分者漂移的模型拓展、认知与测量结合以及改进评分设计等方面值得做进一步的研究。  相似文献   

This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used  相似文献   

This study examined rater effects on essay scoring in an operational monitoring system from England's 2008 national curriculum English writing test for 14‐year‐olds. We fitted two multilevel models and analyzed: (1) drift in rater severity effects over time; (2) rater central tendency effects; and (3) differences in rater severity and central tendency effects by raters’ previous rating experience. We found no significant evidence of rater drift and, while raters with less experience appeared more severe than raters with more experience, this result also was not significant. However, we did find that there was a central tendency to raters’ scoring. We also found that rater severity was significantly unstable over time. We discuss the theoretical and practical questions that our findings raise.  相似文献   

计算机辅助教学应用越来越广,但教师对信息技术与课程整合存在误区。这就要改变教育观念。加强计算机技术的培训。加强师生的实质性改革。  相似文献   

This study describes three least squares models to control for rater effects in performance evaluation: ordinary least squares (OLS); weighted least squares (WLS); and ordinary least squares, subsequent to applying a logistic transformation to observed ratings (LOG-OLS). The models were applied to ratings obtained from four administrations of an oral examination required for certification in a medical specialty. For any single administration, there were 40 raters and approximately 115 candidates, and each candidate was rated by four raters. The results indicated that raters exhibited significant amounts of leniency error and that application of the least squares models would change the pass-fail status of approximately 7% to 9% of the candidates. Ratings adjusted by the models demonstrated higher reliability and correlated slightly higher than observed ratings with the scores on a written examination.  相似文献   

The purpose of this study was to investigate the stability of rater severity over an extended rating period. Multifaceted Rasch analysis was applied to ratings of 16 raters on writing performances of 8, 285 elementary school students. Each performance was rated by two trained raters over a period of seven rating days. Performances rated on the first day were re-rated at the end of the rating period. Statistically significant differences between raters were found within each day and in all days combined. Daily estimates of the relative severity of individual raters were found to differ significantly from single, on-average estimates for the whole rating period. For 10 raters, severity estimates on the last day were significantly different from estimates on the first day. These fndings cast doubt on the practice of using a single calibration of rater severity as the basis for adjustment of person measures.  相似文献   

Researchers have documented the impact of rater effects, or raters’ tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers’ achievement estimates given their response patterns, has not been investigated. In rater-mediated assessments, person fit reflects the reasonableness of rater judgments of individual test-takers’ achievement over components of the assessment. This study illustrates an approach to visualizing and evaluating person fit in assessments that involve rater judgment using rater-mediated person response functions (rm-PRFs). The rm-PRF approach allows analysts to consider the impact of rater effects on person fit in order to identify individual test-takers for whom the assessment results may not have a straightforward interpretation. A simulation study is used to evaluate the impact of rater effects on person fit. Results indicate that rater effects can compromise the interpretation and use of performance assessment results for individual test-takers. Recommendations are presented that call researchers and practitioners to supplement routine psychometric analyses for performance assessments (e.g., rater reliability checks) with rm-PRFs to identify students whose ratings may have compromised interpretations as a result of rater effects, person misfit, or both.  相似文献   


Experiments that involve nested structures may assign treatment conditions either to subgroups (such as classrooms) or individuals within subgroups (such as students). The design of such experiments requires knowledge of the intraclass correlation structure to compute the sample sizes necessary to achieve adequate power to detect the treatment effect. This study provides methods for computing power in three-level block randomized balanced designs (with two levels of nesting) where, for example, students are nested within classrooms and classrooms are nested within schools. The power computations take into account nesting effects at the second (classroom) and at the third (school) level, sample size effects (e.g., number of level-1, level-2, and level-3 units), and covariate effects (e.g., pretreatment measures). The methods are generalizable to quasi-experimental studies that examine group differences on an outcome.  相似文献   


Experiments that involve nested structures often assign entire groups (such as schools) to treatment conditions. Key aspects of the design of such experiments include knowledge of the intraclass correlation structure and the sample sizes necessary to achieve adequate power to detect the treatment effect. This study provides methods for computing power in three-level cluster randomized balanced designs (with two levels of nesting), where, for example, students are nested within classrooms and classrooms are nested within schools and schools are assigned to treatments. The power computations take into account nesting effects at the second (classroom) and at the third (school) level, sample size effects (e.g., number of schools, classrooms, and individuals), and covariate effects (e.g., pretreatment measures). The methods are applicable to quasi-experimental studies that examine group differences in an outcome.  相似文献   

In the article "Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model" (JEM, Volume 31, Number 2, Summer 1994), the data presented in Figure 3 may be misleading. The "four clear spikes" (p. 106) that appear in Figure 3 were highlighted by the automatic scaling procedure used by the computer program that generated this histogram; as is well known, the use of different scaling units would yield histograms with different shapes (Moore & McCabe, 1993). For example, when the same data are presented as a bar chart (see Figure 1 below) rather than as a histogram, the four spikes are not evident. As graphical procedures become more readily available to measurement researchers, additional research and discussion are needed regarding standards for evaluating data displays that do not simply reproduce the actual data values.  相似文献   

色彩是期刊设计中最为直观也最具想象力的因素 ,它以其自身的机能和内在规律 ,在现代期刊装帧设计中发挥着愈来愈重要的作用 ,尤其体现在市场效应上 ,色彩的意义是其他形式不可替代的 .以几种杂志的装帧设计为例 ,具体阐述了色彩在期刊设计中的市场效应  相似文献   

工学结合人才培养模式改革对高职院校专业教学团队建设提出了新要求。淮南联合大学专业教学团队在工学结合教学改革中,首先确定建设总体目标,再以校企合作为平台,以工学结合项目为载体开展专业教学团队建设,专业教学团队整体实力得到了显著提升。  相似文献   

计算机已是电子工程师必需的工具,本介绍如何在电子技术课程中引入计算机电路仿真与辅助电路设计,以加强高师物理专业电子技术课程与计算机的融合。  相似文献   

物理学作为一门实验学科,演示实验在物理教学中具有重要的作用,在设计演示实验时要遵循科学性、直观性和简洁性的原则.笔者发现在我们所使用的教材[1]中,"探究液体内部的压强"和"探究杠杆的平衡条件"两个实验的设计有不足之处,提出我们的意见和改进建议如下:  相似文献   

专业基础课与学生的学习兴趣和能力之间有一定的距离,如何能够唤起并始终保持学生适度的学习兴趣,需要教师对教学的方方面面进行精心设计.  相似文献   

In most U.S. schools, teachers are evaluated using observation of teaching practice (OTP). This study investigates rater effects on OTP ratings among 421 principals in an authentic teacher evaluation system. Many-facet Rasch analysis (MFR) using a block of shared ratings revealed that principals generally (a) differentiated between more and less effective teachers, (b) rated their teachers with leniency (i.e., overused higher rating categories), and (c) differentiated between teaching practices (e.g., Cognitive Engagement vs. Classroom Management) with minimal halo effect. Individual principals varied significantly in degree of leniency, and approximately 12% of principals exhibited severe rater bias. Implications for use of OTP ratings for evaluating teachers’ effectiveness are discussed. Strengths and limitations of MFR to analyze rater effects in OTP are also discussed.  相似文献   

在世界上众多艺术中 ,残缺艺术别具一格 ,独领风骚。正因为残缺 ,它才让许多欣赏者流连 ,并产生无尽的想象。随时间的推移 ,人们不断赋予它新的内涵。因此 ,残缺艺术欣赏实际上是残而不缺。  相似文献   

The classic approach for partitioning and assessing reliability and validity has been through the use of the multitrait-multimethod (MTMM) model. The MTMM approach generally involves 3 different groups (method) evaluating 3 traits. This approach can be reconceptualized for questionnaire evaluation, so that the method becomes 3 different scaling types, which are administered to the same respondents on different occasions to avoid carryover effects. A serious limitation of this MTMM model is that data are required from respondents on at least 3 different occasions, thus placing a heavy burden on the researcher and respondents. Planned incomplete data designs for the purpose of substantially reducing the amount of data required for MTMM models were investigated: 1st, a design that reduces the amount of data collected at the 3rd administration by 22%; and 2nd, a design in which data need only be collected at 2 occasions. The performance of Listwise Deletion, Pairwise Deletion, and the expectation maximization (EM) algorithm at dealing with planned incomplete data are examined through a series of simulations. Results indicate that EM was generally precise and efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号