共查询到20条相似文献,搜索用时 15 毫秒
1.
Stefanie A. Wind 《Journal of Educational Measurement》2019,56(3):478-504
Numerous researchers have proposed methods for evaluating the quality of rater‐mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many‐facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On the other hand, popular parametric methods for evaluating rating quality are often based on measurement theories such as invariant measurement. However, these methods are based on assumptions and transformations that may not be appropriate for ordinal ratings. In this study, I show how researchers can use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, to evaluate rating quality within the framework of invariant measurement without the use of potentially inappropriate parametric techniques. I use an illustrative analysis of data from a rater‐mediated writing assessment to demonstrate how one can use numeric and graphical indicators from MSA to gather evidence of validity, reliability, and fairness. The results from the analyses suggest that MSA provides a useful framework within which to evaluate rater‐mediated assessments for evidence of validity, reliability, and fairness that can supplement existing popular methods for evaluating ratings. 相似文献
2.
A research program is suggested that integrates Admissions procedures and methods of statistical analysis to study the first stage of the Admissions selection process: the rating of applicants. To form the base for evaluation research, a systematic procedure is described that provides an index of applicant quality in the light of institutional goals. Then the rating process itself is explored using a Path Model to measure the contributions of background and achieved characteristics of applicants to their rating. How questions of bias may be raised and pursued is discussed. Applicants are profiled in segments to show how the effects of policy adjustments may be monitored. For doing marketing research, quality-by-enrollment status segments are defined. Using factor analysis models, an analysis of image variance is applied. Next, a discriminant analysis is used to isolate those institutional attributes that most influence higher quality applicants to enroll. Some specifics of a differentiated policy are given in examples. Implications of this integrated approach are discussed.Presented at the Twentieth Annual Forum of the Association for Institutional Research, Atlanta, May 1980. 相似文献
3.
主观题评分质量的估计方法评述 总被引:2,自引:0,他引:2
GUAN Dandan 《中国考试》2008,(10)
在心理测量理论中,主观题的评分质量是一个值得研究的课题。本文分别介绍了三大测量理论(经典测量理论、概化理论、项目反应理论)对于主观题评分质量的估计方法,并对其优劣进行了比较。概化理论和项目反应理论在评价主观题评分质量上具有较明显的优势,如何结合使用三大理论,为主观题评分质量获取更多有价值的信息是值得深入探讨的问题。 相似文献
4.
The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start‐up, plodding, boredom, or fatigue. An understanding of the different types of measurement disturbances can lead to a more complete understanding of persons or items in terms of the construct being measured. Although measurement disturbances have been explored in several contexts, they have not been explicitly considered in the context of performance assessments. The purpose of this study is to illustrate the use of graphical methods to explore measurement disturbances related to raters within the context of a writing assessment. Graphical displays that illustrate the alignment between expected and empirical rater response functions are considered as they relate to indicators of rating quality based on the Rasch model. Results suggest that graphical displays can be used to identify measurement disturbances for raters related to specific ranges of student achievement that suggest potential rater bias. Further, results highlight the added diagnostic value of graphical displays for detecting measurement disturbances that are not captured using Rasch model–data fit statistics. 相似文献
5.
6.
Benjamin D. Wright 《Structural equation modeling》2013,20(1):3-24
This article illustrates how Rasch measurement is preferable to factor analysis for reducing complex data matrices to unidimensional variables. The two methods: (a) address the same kind of data, but with different interpretations of numerical status; (b) use the same estimation methods, but with different measurement models; and (c) solve the same problems, but with substantially different utility. Factor analysis is faulted for mistaking ordinally labeled stochastic observations for linear measures and for failing to construct linear measurement. The motivation and mathematical basis for Rasch measurement are introduced. How to use Rasch measurement to replace factor analysis is developed for a dichotomy and demonstrated for a rating scale. 相似文献
7.
Tom Bramley 《Educational research; a review for teachers and all concerned with progress in education》2013,55(2):251-261
In setting the cut-scores on National Curriculum tests it is important to maintain standards. In the process of test development, both within and across years, changes are made to the style of the questions in order to increase their ‘accessibility’. This raises the question of whether a more accessible test should have higher cut-scores. Purely statistical definitions of equating are blind to differences between ‘accessibility’ and ‘easiness’ and cut-scores derived from statistical equating methods will be higher for a more accessible test. Arguments about the increased validity of the more accessible test are sometimes used to justify not raising the cut-scores as much as would be indicated by statistical methods. These arguments are shown to be equivalent to postulating that changing the accessibility is changing the construct measured by the test. Using a statistical measurement model can provide a rational basis for understanding accessibility and identifying types of question where accessibility issues are causing a measurement problem. 相似文献
8.
多层陶瓷电容(简称MLCC)在电子信息产品中有着广泛的应用,其特点是耐高电压和高热、能够小型化、产量大等,MLCC的生产、封装和使用过程中有很多环节和因素会影响到其质量,不同企业有相应的一些控制MLCC质量的方法.本文主要介绍了MLCC封装过程中可以采取的质量控制方法与措施,如:采用合理的MLCC电容选择仪器与封装材料,用制造执行系统对封装过程进行监控等,通过这些方法可大幅减少封装过程出窥的质量问题. 相似文献
9.
张玉华 《大学.研究与评价》2007,(10)
普通本科教学质量的控制应充分考虑教学质量控制中的不确定性。为此将模糊信息熵引入普通本科教学质量过程,构建一种普通本科教学质量闭环过程控制系统模型,从而可以形成比较准确的普通本科教学质量度量方法和支持系统。 相似文献
10.
The use of evidence to guide policy and practice in education (Cooper, Levin, & Campbell, 2009) has included an increased emphasis on constructed-response items, such as essays and portfolios. Because assessments that go beyond selected-response items and incorporate constructed-response items are rater-mediated (Engelhard, 2002, Engelhard, 2013), it is necessary to develop evidence-based indices of quality for the rating processes used to evaluate student performances. This study proposes a set of criteria for evaluating the quality of ratings based on the concepts of measurement invariance and accuracy within the context of a large-scale writing assessment. Two measurement models are used to explore indices of quality for raters and ratings: the first model provides evidence for the invariance of ratings, and the second model provides evidence for rater accuracy. Rating quality is examined within four writing domains from an analytic rubric. Further, this study explores the alignment between indices of rating quality based on these invariance and accuracy models within each of the four domains of writing. Major findings suggest that rating quality varies across analytic rubric domains, and that there is some correspondence between indices of rating quality based on the invariance and accuracy models. Implications for research and practice are discussed. 相似文献
11.
由于电表内阻的存在,故用传统的伏安法测电阻时会产生误差,为了减少测量误差,利用几种改进后的方法进行测量,测量过程简便、结果准确。 相似文献
12.
Greg Wang 《Performance Improvement Quarterly》2002,15(2):32-46
This research contributes to the methodologies in HPT program evaluation and measurement that are fairly lacking to date. First, a theoretical foundation for a control group is established based on a brief review of control group applications in various fields. Then, four types of control groups applicable to HPT program evaluation and measurement are defined and classified, and threats to internal and external validity in control group applications are explored. Lastly, four evaluation and measurement scenarios are presented for an E‐learning program to demonstrate the applicability of the control group methods for HPT program evaluation and ROI measurement. 相似文献
13.
14.
15.
黄俭 《齐齐哈尔师范高等专科学校学报》2010,(2):75-76
提高收视率,增加创收,关系到地方电视台的生存。应该从提高节目的质量、栏目的科学化设置、电视节目的合理包装三个方面入手。在栏目设置中,要突出特色。控制数量。栏目播出时间安排上要遵循高峰回避原则;提高节目质量要从新闻从业人员的综合素质、节目策划力度、与观众的贴近性方面抓起;电视节目只有通过合理包装,方可吸引更多观众。 相似文献
16.
在用户线(双绞铜线)上开通高速宽带业务存在着故障率高的问题。文章简要介绍了宽带测试方案,提出用统计决策法改进原有的测试方案,提高测试诊断的准确性。 相似文献
17.
在现行统计管理体制和制度下,河北省域统计数据质量监控存在诸多困境,尚未实现全方位监控,缺乏明确的管控目标和统一规范。应采取科学的统计数据质量监控方法,并切实拟订并实施有效的统计数据质量监控的对策,以应对河北省统计数据质量监控的新要求。 相似文献
18.
Laura Trinchera Nicolas Marie George A. Marcoulides 《Structural equation modeling》2018,25(6):876-887
Scales are important tools for obtaining quantitative measures of theoretical constructs. Once a set of measures to be used in a scale is selected, reliability is commonly examined in order to assess their measurement quality. To date, Cronbach’s coefficient alpha is the most commonly reported index of measurement quality for assessing scale reliability. In this paper, an asymptotic distribution of the natural estimator of coefficient alpha is derived. A new interval estimate and a statistical test on the significance of the sample estimate of the coefficient are also presented. The proposed approach is compared to four popular methods commonly used to compute confidence intervals (CI) for alpha using a Monte Carlo simulation study. An R function for implementing the proposed CI approach is also provided. 相似文献
19.
Charles A. Melvin 《Performance Improvement Quarterly》1993,6(3):74-85
Gathering information or collecting data is the norm in school systems across the nation. Using that data to make informed decisions should necessitate the use of statistical tools. One such tool, developed by Walter A. Shewhart at Bell Laboratories in 1924, was the ‘Control Chart,’ a means of determining whether a process had been operating in a state of statistical control or operating in the presence of special causes of variation warranting corrective action. Use of control charts has long been an industry practice. As a school district interested in continuous quality improvement, Beloit Turner explored the application of control charts to a number of instructional and non-instructional areas early on in a restructuring project. This article looks at five such explorations. 相似文献
20.
E. Penelope Holland 《Assessment & Evaluation in Higher Education》2019,44(6):961-972
Quantitative student evaluations of teaching (SET) and assessments are widely used in higher education as a proxy for teaching quality. However, SET are a function of individual rating behaviours resulting from student background, knowledge and personalities, as well as the learning experience being rated. SET from three years of data from a science department at a Russell Group University in the UK were analysed to highlight issues of sample size in relation to variable perceptions of modules, and develop a statistical model of feedback incorporating individual rating behaviours across modules. Key results are that sample size and individual rating behaviours have the potential to significantly affect summary module ratings, especially for <20 respondents or if individuals have heterogeneous views. A new approach is suggested, to interpret and compare quantitative module ratings, acknowledging uncertainty, variability and individual rating behaviours. This has implications for the interpretation of SET in many aspects of academic life, including university league table positions, the identification of good teaching practice with respect to student satisfaction, and the weight given to SET in individual academics’ promotion applications. 相似文献