This paper presents a framework to provide a structured approach for developing score reports for cognitive diagnostic assessments (CDAs). Guidelines for reporting and presenting diagnostic scores are based on a review of current educational test score reporting practices and literature from the area of information design. A sample diagnostic report is presented to illustrate application of the reporting framework in the context of one CDA procedure called the Attribute Hierarchy Method. Integration and application of interdisciplinary techniques from education, information design, and technology are required for effective score reporting. While the AHM is used in this paper, this framework is applicable to any attribute-based diagnostic testing method.  相似文献   

Test scores matter these days. Test‐takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees’ information or usability needs, but this is clearly changing for the better due to recent, much‐needed additions to the psychometric literature as well as improved efforts in reporting practices. This paper provides an overview of score reports from a development perspective, focusing on current practices and emerging efforts in content of reports as well as the process by which reports are designed, evaluated, and ultimately used to communicate with the public.  相似文献   

With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as Hill and Sinharay et al. examined the impact of interruptions at an aggregate level. However, there is a lack of research on the assessment of impact of interruptions at an individual level. We attempt to fill that void. We suggest four methodological approaches, primarily based on statistical hypothesis testing, linear regression, and item response theory, which can provide evidence on the individual‐level impact of interruptions. We perform a realistic simulation study to compare the Type I error rate and power of the suggested approaches. We then apply the approaches to data from the 2013 Indiana Statewide Testing for Educational Progress‐Plus (ISTEP+) test that experienced interruptions.  相似文献   

The goal of this study was to investigate the usefulness of person‐fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two‐stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person‐fit statistic, the hierarchy consistency index (HCI; Cui, 2007 ; Cui & Leighton, 2009 ), was used to identify the misfitting student item‐score vectors. In the second stage, students’ verbal reports were collected to provide additional information about students’ response processes so as to reveal the actual causes of misfits. This two‐stage procedure helped to identify the misfits of item‐score vectors to the cognitive model used in the design and analysis of the diagnostic test, and to discover the reasons of misfits so that students’ problem‐solving strategies were better understood and their performances were interpreted in a more meaningful way.  相似文献   

Large‐scale assessment results for schools, school boards/districts, and entire provinces or states are commonly reported as the percentage of students achieving a standard—‐that is, the percentage of students scoring above the cut score that defines the standard on the assessment scale. Recent research has shown that this method of reporting is sensitive to small changes in the cut score, especially when comparing results across years or between groups. This study builds on that work, investigating the effects of reporting group size on the stability of results. In Part 1 of this study, Grade 6 students’ results on Ontario's 2008 and 2009 Junior Assessments of Reading, Writing and Mathematics were compared, by school, for different sizes of schools. In Part 2, samples of students’ results on the 2009 assessment were randomly drawn and compared, for 10 group sizes, to estimate the variability in results due to sampling error. The results showed that the percentage of students above a cut score (PAC) was unstable for small schools and small randomly drawn groups.  相似文献   

本文通过指出原始分在成绩评定中存在的问题,提出运用标准分来客观,公正地评价学生的成绩,从而促进应试教育向素质教育的转轨。  相似文献   

近年来法人类学研究成果很多,本文就其概念、内涵、方法、目标、对法律及非正式制度的理解、研究本质与效果等方面观点作出评述。  相似文献   

现有成绩分析管理系统中包含成绩正态分布校验功能,对教学的分析和评价具有至关重要的作用.在VBA的基础上,给出了峰度、偏度正态校验方法的实现过程,并且在校验前,借鉴经验采用均值等方法对成绩数据进行预处理,在正态校验前剔除掉不符合要求的数据,降低了计算量,提高了计算效率,得到较好的分析效果.  相似文献   

针对在成绩统计中,教师工作繁重,但又对学生的学习不能做出公正的评价这一现象,提出在成绩统计过程中应当采用合理的分数制度,即标准分制度、通过运用实例后,分析比较得出,运用标:隹分制度评分能提高教师的工作效率,有效地对学生学习做出客观评价.  相似文献   

Although assessments of mathematics, reading, and writing are assumed to measure distinct academic skills, this may be difficult owing to the pervasive influence of general ability on performance. Factor analyses of school-level data from 14 large-scale assessment programs revealed that 80% of the variance in mathematics, reading, and writing scores was due to a common, underlying factor. Multiple regression analyses confirmed that scores contribute little information that is unique to a particular subject (6% or less). Although different assessments may create the illusion of providing unique information, they may be tapping into generic cognitive abilities that cut across content areas. These results raise suspicions about the value and validity of interpretations based on school-level subject area scores.  相似文献   

伴随蹦床被列入奥运会项目,它将有望在2008年成为我国金牌新的增长点.由于蹦床运动在我国发展时间短,我国体育界对蹦床理论的系统性研究还十分有限,对于影响蹦床运动成绩的主要竞技能力因素的研究目前还比较少.综合分析了近年来国内常用的影响竞技能力因素的指标,并对今后我国在该领域的研究方向提出了建议.  相似文献   

语言测试分数体系设计的优劣关系到语言测试的分数能否准确而简洁地反映语言测试结果丰富内涵的问题。语言测试分数体系设计应充分考虑测验的目的、分数报告精细程度、不同次测验分数间的关系、测验等值的技术与方案、分数级别数量的多寡等重要因素的影响。  相似文献   

Factor score regression has recently received growing interest as an alternative for structural equation modeling. However, many applications are left without guidance because of the focus on normally distributed outcomes in the literature. We perform a simulation study to examine how a selection of factor scoring methods compare when estimating regression coefficients in generalized linear factor score regression. The current study evaluates the regression method and the correlation-preserving method as well as two sum score methods in ordinary, logistic, and Poisson factor score regression. Our results show that scoring method performance can differ notably across the considered regression models. In addition, the results indicate that the choice of scoring method can substantially influence research conclusions. The regression method generally performs the best in terms of coefficient and standard error bias, accuracy, and empirical Type I error rates. Moreover, the regression method and the correlation-preserving method mostly outperform the sum score methods.  相似文献   

职业素质测评的发展述评   总被引:1,自引:0,他引:1  
伴随着20世纪20年代心理测验运动的大力发展,职业素质测评便逐渐成为人力资源开发与管理的重要组成部分,在各类人才选拔和评价中发挥着重要的作用。作为一项核心内容,职业素质测评工具的研制得到了研究者们的高度重视,其从本质上包括综合型和单项型两类工具。未来职业素质测评发展应特别关注传统编制思路的改革和关键技术问题的解决这两个与工具研制有关的内容。  相似文献   

为了解决目前关系数据库关键词查询效果不理想的问题,通过优化传统信息搜索领域的评分函数中的关键因子,提出了一种新的评分函数。通过实际数据分析验证该新的评分函数合理、有效。  相似文献   

高校将平衡记分卡方法引入辅导员绩效评估,从成本维度、学生维度、高校内部管理维度,学习与成长维度等四个方面构建高校辅导员绩效评估体系。要处理好基于平衡计分卡的高校辅导员绩效评估指标的选择及量化问题;要强化不同院系高校辅导员之间的沟通协作,最大化的动员各个相关方的参与;要配套完善基于平衡计分卡的高校辅导员绩效评估的人力资源信息系统建设;要重视基于平衡计分卡的高校辅导员绩效评估的实施宣导和评估培训工作等高校辅导员绩效评估实施对策。  相似文献   

由多位评委评分的教育评价活动中,评分的等级次序的一致性影响评价的可信性。运用肯德尔和谐系数可以检验评分的一致性程度,以判断评价数据的可信性和评价活动的有效性.  相似文献   

应用CNKI检索近20年来中国学者在宗教心理、民俗心理、迷信心理研究方面的相关文献,并手工查阅相关书籍,总结国内学者对宗教、民俗、迷信心理的研究成果,以期对进一步的研究有所启示。  相似文献   

The latent change score framework allows for estimating a variety of univariate trajectory models, such as the no change, linear change, exponential forms of change, as well as multivariate trajectory models that allow for coupling between two or more constructs. A particularly attractive feature of these models is that it is easy to decompose and interpret aspects of change. One particularly flexible model, the dual change score model, has two components of change: a proportional change component that depends on scores at the previous time point, and a constant change component that is additive. We demonstrate through simulation and an empirical example that in a correctly specified model, the correlation between the proportional change parameter and the mean of the constant change component can approach either ?1 or 1, thus complicating interpretation. We provide recommendations and code to aid researchers’ ability to diagnose this issue in their own data.  相似文献   

Latent difference score models (e.g., McArdle & Hamagami, 2001 McArdle, J. J. 2001. “A latent difference score approach to longitudinal dynamic structural analysis.”. In Structural equation modeling: Present and future Edited by: Cudeck, R., du Toit, S. and Sorbom, D. 342380. Lincolnwood, IL: Scientific Software International..  [Google Scholar]) are extended to include effects from prior changes to subsequent changes. This extension of latent difference scores allows for testing hypotheses where recent changes, as opposed to recent levels, are a primary predictor of subsequent changes. These models are applied to bivariate longitudinal data collected as part of the Baltimore Longitudinal Study of Aging on memory performance, measured by the California Verbal Learning Test, and lateral ventricle size, measured by structural MRIs. Results indicate that recent increases in the lateral ventricle size were a leading indicator of subsequent declines in memory performance from age 60 to 90.  相似文献   

