期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Nonparametric Evidence of Validity,Reliability, and Fairness for Rater‐Mediated Assessments: An Illustration Using Mokken Scale Analysis

Stefanie A. Wind 《Journal of Educational Measurement》2019,56(3):478-504

Numerous researchers have proposed methods for evaluating the quality of rater‐mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many‐facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On the other hand, popular parametric methods for evaluating rating quality are often based on measurement theories such as invariant measurement. However, these methods are based on assumptions and transformations that may not be appropriate for ordinal ratings. In this study, I show how researchers can use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, to evaluate rating quality within the framework of invariant measurement without the use of potentially inappropriate parametric techniques. I use an illustrative analysis of data from a rater‐mediated writing assessment to demonstrate how one can use numeric and graphical indicators from MSA to gather evidence of validity, reliability, and fairness. The results from the analyses suggest that MSA provides a useful framework within which to evaluate rater‐mediated assessments for evidence of validity, reliability, and fairness that can supplement existing popular methods for evaluating ratings. 相似文献

2.

Coordinating market and evaluation research on the Admissions rating process

Robert Lay John Maguire 《Research in higher education》1981,14(1):71-85

A research program is suggested that integrates Admissions procedures and methods of statistical analysis to study the first stage of the Admissions selection process: the rating of applicants. To form the base for evaluation research, a systematic procedure is described that provides an index of applicant quality in the light of institutional goals. Then the rating process itself is explored using a Path Model to measure the contributions of background and achieved characteristics of applicants to their rating. How questions of bias may be raised and pursued is discussed. Applicants are profiled in segments to show how the effects of policy adjustments may be monitored. For doing marketing research, quality-by-enrollment status segments are defined. Using factor analysis models, an analysis of image variance is applied. Next, a discriminant analysis is used to isolate those institutional attributes that most influence higher quality applicants to enroll. Some specifics of a differentiated policy are given in examples. Implications of this integrated approach are discussed.Presented at the Twentieth Annual Forum of the Association for Institutional Research, Atlanta, May 1980. 相似文献

3.

主观题评分质量的估计方法评述 总被引：2，自引：0，他引：2

GUAN Dandan 《中国考试》2008,(10)

在心理测量理论中,主观题的评分质量是一个值得研究的课题。本文分别介绍了三大测量理论(经典测量理论、概化理论、项目反应理论)对于主观题评分质量的估计方法,并对其优劣进行了比较。概化理论和项目反应理论在评价主观题评分质量上具有较明显的优势,如何结合使用三大理论,为主观题评分质量获取更多有价值的信息是值得深入探讨的问题。相似文献

4.

Detecting Measurement Disturbances in Rater‐Mediated Assessments

下载免费PDF全文

Stefanie A. Wind Randall E. Schumacker 《Educational Measurement》2017,36(4):44-51

The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start‐up, plodding, boredom, or fatigue. An understanding of the different types of measurement disturbances can lead to a more complete understanding of persons or items in terms of the construct being measured. Although measurement disturbances have been explored in several contexts, they have not been explicitly considered in the context of performance assessments. The purpose of this study is to illustrate the use of graphical methods to explore measurement disturbances related to raters within the context of a writing assessment. Graphical displays that illustrate the alignment between expected and empirical rater response functions are considered as they relate to indicators of rating quality based on the Rasch model. Results suggest that graphical displays can be used to identify measurement disturbances for raters related to specific ranges of student achievement that suggest potential rater bias. Further, results highlight the added diagnostic value of graphical displays for detecting measurement disturbances that are not captured using Rasch model–data fit statistics. 相似文献

5.

基于大数据的电能质量监测分析系统设计与实现

郭晓乾武守晓王承栋刘思宇《教育技术导刊》2009,19(8):182-185

为解决传统电能质量监测系统数据接入可靠性不高,海量数据存储和统计分析能力不足的缺点,采用类数据库的事务处理机制设计数据调度采集过程,搭建基于 Cloudera 大数据平台的电能质量监测分析系统,对数据进行分布式存储、计算分析,实现对 TB 级电能质量数据的监测点指标与运行状态统计,以及对暂态事件的统计聚合分析等功能。实验证明该系统可靠、海量数据统计处理能力强,提高了数据存储可拓展性,为供电方提供了解决海量电能质量数据存储与分析的有效方案。相似文献

6.

Comparing Rasch measurement and factor analysis

Benjamin D. Wright 《Structural equation modeling》2013,20(1):3-24

This article illustrates how Rasch measurement is preferable to factor analysis for reducing complex data matrices to unidimensional variables. The two methods: (a) address the same kind of data, but with different interpretations of numerical status; (b) use the same estimation methods, but with different measurement models; and (c) solve the same problems, but with substantially different utility. Factor analysis is faulted for mistaking ordinally labeled stochastic observations for linear measures and for failing to construct linear measurement. The motivation and mathematical basis for Rasch measurement are introduced. How to use Rasch measurement to replace factor analysis is developed for a dichotomy and demonstrated for a rating scale. 相似文献

7.

Accessibility,easiness and standards

Tom Bramley 《Educational research; a review for teachers and all concerned with progress in education》2013,55(2):251-261

In setting the cut-scores on National Curriculum tests it is important to maintain standards. In the process of test development, both within and across years, changes are made to the style of the questions in order to increase their ‘accessibility’. This raises the question of whether a more accessible test should have higher cut-scores. Purely statistical definitions of equating are blind to differences between ‘accessibility’ and ‘easiness’ and cut-scores derived from statistical equating methods will be higher for a more accessible test. Arguments about the increased validity of the more accessible test are sometimes used to justify not raising the cut-scores as much as would be indicated by statistical methods. These arguments are shown to be equivalent to postulating that changing the accessibility is changing the construct measured by the test. Using a statistical measurement model can provide a rational basis for understanding accessibility and identifying types of question where accessibility issues are causing a measurement problem. 相似文献

8.

多层陶瓷电容封装的质量控制

张霞刘红波《深圳职业技术学院学报》2010,9(3):77-80

多层陶瓷电容（简称MLCC）在电子信息产品中有着广泛的应用,其特点是耐高电压和高热、能够小型化、产量大等,MLCC的生产、封装和使用过程中有很多环节和因素会影响到其质量,不同企业有相应的一些控制MLCC质量的方法．本文主要介绍了MLCC封装过程中可以采取的质量控制方法与措施,如：采用合理的MLCC电容选择仪器与封装材料,用制造执行系统对封装过程进行监控等,通过这些方法可大幅减少封装过程出窥的质量问题．相似文献

9.

基于信息熵的普通本科教学质量控制系统初探

张玉华《大学.研究与评价》2007,(10)

普通本科教学质量的控制应充分考虑教学质量控制中的不确定性。为此将模糊信息熵引入普通本科教学质量过程,构建一种普通本科教学质量闭环过程控制系统模型,从而可以形成比较准确的普通本科教学质量度量方法和支持系统。相似文献

10.

How invariant and accurate are domain ratings in writing assessment?

Stefanie A. Wind George Engelhard 《Assessing Writing》2013,18(4):278-299

The use of evidence to guide policy and practice in education (Cooper, Levin, & Campbell, 2009) has included an increased emphasis on constructed-response items, such as essays and portfolios. Because assessments that go beyond selected-response items and incorporate constructed-response items are rater-mediated (Engelhard, 2002, Engelhard, 2013), it is necessary to develop evidence-based indices of quality for the rating processes used to evaluate student performances. This study proposes a set of criteria for evaluating the quality of ratings based on the concepts of measurement invariance and accuracy within the context of a large-scale writing assessment. Two measurement models are used to explore indices of quality for raters and ratings: the first model provides evidence for the invariance of ratings, and the second model provides evidence for rater accuracy. Rating quality is examined within four writing domains from an analytic rubric. Further, this study explores the alignment between indices of rating quality based on these invariance and accuracy models within each of the four domains of writing. Major findings suggest that rating quality varies across analytic rubric domains, and that there is some correspondence between indices of rating quality based on the invariance and accuracy models. Implications for research and practice are discussed. 相似文献

11.

伏安法测电阻实验中减少误差的几种方法

李珏璇曾宪彪《河池学院学报》2011,31(5):6-9,42

由于电表内阻的存在,故用传统的伏安法测电阻时会产生误差,为了减少测量误差,利用几种改进后的方法进行测量,测量过程简便、结果准确。相似文献

12.

Control Group Methods for HPT Program Evaluation and Measurement

Greg Wang 《Performance Improvement Quarterly》2002,15(2):32-46

This research contributes to the methodologies in HPT program evaluation and measurement that are fairly lacking to date. First, a theoretical foundation for a control group is established based on a brief review of control group applications in various fields. Then, four types of control groups applicable to HPT program evaluation and measurement are defined and classified, and threats to internal and external validity in control group applications are explored. Lastly, four evaluation and measurement scenarios are presented for an E‐learning program to demonstrate the applicability of the control group methods for HPT program evaluation and ROI measurement. 相似文献

13.

自主的结构与测量 总被引：3，自引：0，他引：3

夏凌翔黄希庭吴波《西南师范大学学报(人文社会科学版)》2007,33(3):10-15

学者们提出了种类繁多的自主结构、测量工具、测量方法和测量指标。自主结构的划分包括基于自主测量的自主结构和仅仅基于理论分析的自主划分两类。自主的测量包括自陈法和他评法两类。自陈法包括自主量表、其他量表中的自主分量表、测量自主某个方面的量表和其他方法四类。他评法则主要是研究者通过观察、访谈等方法来收集资料，之后根据有关的编码系统等手段来评价个体的自主情况。最后，对自主与自立的结构和测量问题进行了对比分析。相似文献

14.

大连市星海1号桥锚碇沉井施工技术 总被引：3，自引：0，他引：3

陈昌平于林平刘显刚《大连大学学报》2007,28(3):28-32

沉井基础的施工质量是影响建筑工程质量一项重要的因素,文章结合大连市星海1号桥锚碇沉井工程,针对复杂的地质条件,介绍了在地基处理、沉井制作、沉井下沉及沉井封底等施工工艺中采取的一系列技术措施. 相似文献

15.

如何能提高城市电视台收视率的几点想法

黄俭《齐齐哈尔师范高等专科学校学报》2010,(2):75-76

提高收视率,增加创收,关系到地方电视台的生存。应该从提高节目的质量、栏目的科学化设置、电视节目的合理包装三个方面入手。在栏目设置中,要突出特色。控制数量。栏目播出时间安排上要遵循高峰回避原则;提高节目质量要从新闻从业人员的综合素质、节目策划力度、与观众的贴近性方面抓起;电视节目只有通过合理包装,方可吸引更多观众。相似文献

16.

用于非对称数字用户线路的宽带测试诊断系统的研究

陈卫周彩铃《连云港师范高等专科学校学报》2006,(2):88-90

在用户线(双绞铜线)上开通高速宽带业务存在着故障率高的问题。文章简要介绍了宽带测试方案,提出用统计决策法改进原有的测试方案,提高测试诊断的准确性。相似文献

17.

省域统计数据质量监控方法与对策之生成——以河北省为例

项贤国朱玲《河北广播电视大学学报》2014,(2):62-65

在现行统计管理体制和制度下,河北省域统计数据质量监控存在诸多困境,尚未实现全方位监控,缺乏明确的管控目标和统一规范。应采取科学的统计数据质量监控方法,并切实拟订并实施有效的统计数据质量监控的对策,以应对河北省统计数据质量监控的新要求。相似文献

18.

A Distribution Free Interval Estimate for Coefficient Alpha

Laura Trinchera Nicolas Marie George A. Marcoulides 《Structural equation modeling》2018,25(6):876-887

Scales are important tools for obtaining quantitative measures of theoretical constructs. Once a set of measures to be used in a scale is selected, reliability is commonly examined in order to assess their measurement quality. To date, Cronbach’s coefficient alpha is the most commonly reported index of measurement quality for assessing scale reliability. In this paper, an asymptotic distribution of the natural estimator of coefficient alpha is derived. A new interval estimate and a statistical test on the significance of the sample estimate of the coefficient are also presented. The proposed approach is compared to four popular methods commonly used to compute confidence intervals (CI) for alpha using a Monte Carlo simulation study. An R function for implementing the proposed CI approach is also provided. 相似文献

19.

Application of Control Charts to an Educational System

Charles A. Melvin 《Performance Improvement Quarterly》1993,6(3):74-85

Gathering information or collecting data is the norm in school systems across the nation. Using that data to make informed decisions should necessitate the use of statistical tools. One such tool, developed by Walter A. Shewhart at Bell Laboratories in 1924, was the ‘Control Chart,’ a means of determining whether a process had been operating in a state of statistical control or operating in the presence of special causes of variation warranting corrective action. Use of control charts has long been an industry practice. As a school district interested in continuous quality improvement, Beloit Turner explored the application of control charts to a number of instructional and non-instructional areas early on in a restructuring project. This article looks at five such explorations. 相似文献

20.

Making sense of module feedback: accounting for individual behaviours in student evaluations of teaching

E. Penelope Holland 《Assessment & Evaluation in Higher Education》2019,44(6):961-972

Quantitative student evaluations of teaching (SET) and assessments are widely used in higher education as a proxy for teaching quality. However, SET are a function of individual rating behaviours resulting from student background, knowledge and personalities, as well as the learning experience being rated. SET from three years of data from a science department at a Russell Group University in the UK were analysed to highlight issues of sample size in relation to variable perceptions of modules, and develop a statistical model of feedback incorporating individual rating behaviours across modules. Key results are that sample size and individual rating behaviours have the potential to significantly affect summary module ratings, especially for <20 respondents or if individuals have heterogeneous views. A new approach is suggested, to interpret and compare quantitative module ratings, acknowledging uncertainty, variability and individual rating behaviours. This has implications for the interpretation of SET in many aspects of academic life, including university league table positions, the identification of good teaching practice with respect to student satisfaction, and the weight given to SET in individual academics’ promotion applications. 相似文献