Many large‐scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple‐choice and/or constructed responses sections of items to generate multiple scores. In the current article, we propose an extension of the hierarchical rater model (HRM) to be applied with simple structured tests with constructed response items. In addition to modeling the appropriate trait structure, the multidimensional HRM (M‐HRM) presented here also accounts for rater severity bias and rater variability or inconsistency. We introduce the model formulation, test parameter recovery with a focus on latent traits, and compare the M‐HRM to other scoring approaches (unidimensional HRMs and a traditional multidimensional item response theory model) using simulated and empirical data. Results show more precise scores under the M‐HRM, with a major improvement in scores when incorporating rater effects versus ignoring them in the traditional multidimensional item response theory model.  相似文献   

In signal detection rater models for constructed response (CR) scoring, it is assumed that raters discriminate equally well between different latent classes defined by the scoring rubric. An extended model that relaxes this assumption is introduced; the model recognizes that a rater may not discriminate equally well between some of the scoring classes. The extension recognizes a different type of rater effect and is shown to offer useful tests and diagnostic plots of the equal discrimination assumption, along with ways to assess rater accuracy and various rater effects. The approach is illustrated with an application to a large‐scale language test.  相似文献   

An approach to essay grading based on signal detection theory (SDT) is presented. SDT offers a basis for understanding rater behavior with respect to the scoring of construct responses, in that it provides a theory of psychological processes underlying the raters' behavior. The approach also provides measures of the precision of the raters and the accuracy of classifications. An application of latent class SDT to essay grading is detailed, and similarities to and differences from item response theory (IRT) are noted. The validity and utility of classifications obtained from the SDT model and scores obtained from IRT models are compared. Validity coefficients were found to be about equal in magnitude across SDT and IRT models. Results from a simulation study of a 5-class SDT model with eight raters are also presented.  相似文献   

多面Rasch模型在主观题评分培训中的应用   总被引:7,自引:2,他引:7  
主观题的评分受到很多因素的影响,如评分者的知识水平、综合能力和个人偏好等。这些评分者偏差不仅会导致不同评分者之间存在主观差异,也会到导致同一评分者在不同的时间也具有主观不稳定性,最终导致主观题评分信度的降低。本研究将多面Rasch模型运用到某国家级考试论述题的评分培训中。通过分析6名有经验评分者对58份试卷的试评数据,鉴别出四种评分者偏差,然后据此对每个评分者进行个别反馈,从而提高评分的客观性和精确性。  相似文献   

This study examined rater effects on essay scoring in an operational monitoring system from England's 2008 national curriculum English writing test for 14‐year‐olds. We fitted two multilevel models and analyzed: (1) drift in rater severity effects over time; (2) rater central tendency effects; and (3) differences in rater severity and central tendency effects by raters’ previous rating experience. We found no significant evidence of rater drift and, while raters with less experience appeared more severe than raters with more experience, this result also was not significant. However, we did find that there was a central tendency to raters’ scoring. We also found that rater severity was significantly unstable over time. We discuss the theoretical and practical questions that our findings raise.  相似文献   

The presence of nuisance dimensionality is a potential threat to the accuracy of results for tests calibrated using a measurement model such as a factor analytic model or an item response theory model. This article describes a mixture group bifactor model to account for the nuisance dimensionality due to a testlet structure as well as the dimensionality due to differences in patterns of responses. The model can be used for testing whether or not an item functions differently across latent groups in addition to investigating the differential effect of local dependency among items within a testlet. An example is presented comparing test speededness results from a conventional factor mixture model, which ignores the testlet structure, with results from the mixture group bifactor model. Results suggested the 2 models treated the data somewhat differently. Analysis of the item response patterns indicated that the 2-class mixture bifactor model tended to categorize omissions as indicating speededness. With the mixture group bifactor model, more local dependency was present in the speeded than in the nonspeeded class. Evidence from a simulation study indicated the Bayesian estimation method used in this study for the mixture group bifactor model can successfully recover generated model parameters for 1- to 3-group models for tests containing testlets.  相似文献   

此研究以网上阅卷环境下多个评分者同时评阅翻译和作文为例,建立多个评分者完成多个任务的结构方程模型,对数据进行拟合,实现评分者信度的量化分析。通过五个结构方程模型比较,选择拟合效果较好的相关任务相关特性模型,计算多评分者多任务的评分者信度,并对同一评分者完成不同任务时评分者信度和同一评分任务下不同评分者的评分信度进行比较,实现对评分效果的评价,从而对评分者的选拔和有针对性培训提供科学支持。  相似文献   

远程学习的教学交互模型和教学交互层次塔   总被引:54,自引:6,他引:54  
本研究以Laurillard于2001年提出的学习过程的会话模型为原型,建立了远程学习的教学交互模型.教学交互模型由三个层面所组成:学生与媒体的操作交互、学生与教学要素的信息交互、以及学生的概念和新概念的概念交互.这三个层面的教学交互在学习过程中可能同时发生,学习者的学习在这三个层面的教学交互共同作用下完成.其中信息交互包括三种形式:学生与学习资源的交互、学生与教师的交互、学生与学生的交互.这三种形式的信息交互相互补充.根据远程学习的教学交互模型,作者把学习过程中的三个不同层面的教学交互按照其抽象的程度,从上到下形象地呈现出来,由此形成了教学交互的层次塔.本研究首次将远程学习过程分解为三种教学交互,由此揭示远程学习的教学交互本质.同时,通过采用教学交互层次塔的形式,形象地概括出三个层面教学交互对学习的不同意义及相互依存关系.  相似文献   

This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used  相似文献   

在Bachman的CLA模式指导下,从语言能力和交际策略两个方面构建了一个较为宏观的教学新模式。此模式主要涉及到分级标准与方式、层级式教学策略、层级式考核评估等三个环节。在教学策略部分则包括教学组织形式的层级化、教学资源分配的层级化、教师角色的多样化三个方面。此模式为当前分级模式下的大学英语口语教学提供了具有可操作性的理论框架。  相似文献   

Using a sample of schools testing annually in grades 9–11 with a vertically linked series of assessments, a latent growth curve model is used to model test scores with student intercepts and slopes nested within school. Missed assessments can occur because of student mobility, student dropout, absenteeism, and other reasons. Missing data indicators are modeled using logistic regression, with grade 9 and potentially unobserved growth scores used as covariates. Under a hierarchical selection model, estimates of school effects on academic growth and missingness are obtained. The results from the selection model are compared to a model that ignores the missing data process.  相似文献   

本文利用线性全连续场谱理论,中心流形约化与非线性耗散系统吸引子分歧与跃迁理论研究了一类带有扩散项的病毒模型的动态分歧,该模型的分歧与区域Ω的选取有关,当∫_Ωψ_13dx≠0时,控制参数λ大于临界点时,方程从平衡态处发生分歧,原有的平衡态失稳,分歧出一个稳定的奇点吸引子,在λ小于临界点一侧分歧出唯一的鞍点;当∫_Ωψ_13dx≠0时,控制参数λ大于临界点时,方程从平衡态处发生分歧,原有的平衡态失稳,分歧出一个稳定的奇点吸引子,在λ小于临界点一侧分歧出唯一的鞍点;当∫_Ωψ_13dx=0时,本文给出了上述模型发生分歧的条件及临界点,当λ大于临界点,原有平衡态失稳,方程从平衡态处发生分歧,分歧出两个稳定奇点,当λ小于临界点时,方程从平衡态处分歧出两个鞍点.本文给出了在Dirichlet边界条件下,方程分歧出的稳定奇点吸引子和两个鞍点的表达式.  相似文献   

移动存储介质的使用是影响网络病毒传播的一个非常重要的因素。研究一类考虑移动存储介质且具有分级感染率的SEIS时滞网络病毒传播模型。模型假设接入网络中的设备部分已经感染病毒。以网络病毒的潜伏期时滞为分岔参数,利用特征值法,分析了模型特征方程根的分布情况,根据分析结果给出模型局部渐近稳定和产生Hopf分岔的充分条件。最后,利用仿真示例验证了所得结果的正确性。  相似文献   

This paper presents a general longitudinal model for estimating school effects and their stability. Previous research on the stability of school performance over successive years has produced inconsistent findings. We argue that the findings have been inconsistent for at least two reasons: researchers have estimated different types of school effects, and they have not distinguished between instability due to true changes in school performance and instability due to measurement and sampling error. We describe two different types of school effects, each relevant to a different policy audience, and we present a longitudinal model that is capable of separating true changes in school effects from sampling and measurement error. The model also provides a means for estimating the effects of school policies and practices while controlling statistically for the effects of factors exogenous to the schooling system. This paper provides an example of the approach based on data describing two cohorts of students from one Education Authority in Scotland. It concludes with a discussion of the limitations of the model and implications for those collecting indicators of school performance for planning and evaluation purposes.  相似文献   

在DMIS模型基础上,结合我国现阶段大学生跨文化交际能力教学缺乏分层培养的实际,指出跨文化交际能力发展的六个层次和与之相适应的培养目标、教学策略以及评估方式。  相似文献   

Rater‐mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater‐mediated assessments using three distinct models. The first model is the observation model, which includes ecological/environmental considerations for the evaluation system. The second model is the measurement model, which includes the transformation of observed, rater response data to linear measures using a measurement model with specific requirements of rater‐invariant measurement in order to examine raters’ construct‐relevant variability stemming from the evaluative system. The third model is the interaction model, which includes an interaction parameter to allow for the investigation into raters’ systematic, construct‐irrelevant variability stemming from the evaluative system. Implications for measurement outcomes and validity are discussed.  相似文献   

For operational purposes, physical development is characterized using a multidimensional hierarchical model. Optimal physical development is described as a combination of good physical fitness and a high level of skill development. Physical fitness has multiple subdimensions of its own. Developing each subdimension requires regular physical activity, which is not likely to occur without the collaboration of many, including the individual, family, friends, schools, community, and private agencies.  相似文献   

This report presents a systematic model for training student counselors to make advanced influencing skills such as confrontation. The authors have developed a videotape training package that integrates cognitive structures and counselor performance through rehearsal and immediate feedback so that counselors-in-training can move comfortably into a more active helping relationship with clients.  相似文献   

考虑捕食者无密度制约,食饵具有非线性密度制约的第3类Holing功能性反应捕食者-食饵系统,对该系统给出了完整的定性分析,证明了该系统至多有一个极限环,存在极限环的充要条件是正平衡点不稳定  相似文献   

通过对城市单交叉路口进行交通流分析,提出以车辆平均延误为性能指标的4相位交通信号控制模型,并将改进的遗传算法应用于此模型。在Matlab 7.0环境下对实际交叉路口中较为复杂的4相位控制方式下轻度、中度和重度交通流模式进行仿真。结果表明,该方案优于普通的遗传算法和传统的定时控制。  相似文献   

