期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Validity Framework for Evaluating the Technical Quality of Alternate Assessments

Scott F. Marion James W. Pellegrino 《Educational Measurement》2006,25(4):47-57

This article presents findings from two projects designed to improve evaluations of technical quality of alternate assessments for students with the most significant cognitive disabilities. We argue that assessment technical documents should allow for the evaluation of the construct validity of the alternate assessments following the traditions of Cronbach (1971) , Messick (1989, 1995) , Linn, Baker, and Dunbar (1991) , and Shepard (1993) . The projects used the work of Knowing What Students Know ( Pellegrino, Chudowsky, & Glaser, 2001 ) to structure and focus the collection and evaluation of assessment information. The heuristic of the assessment triangle ( Pellegrino et al., 2001 ) was particularly useful in emphasizing that the validity evaluation needs to consider the logical connections among the characteristics of the students tested and how they develop domain proficiency (the cognition vertex), the nature of the assessment (the observation vertex), and the ways in which the assessment results are interpreted (the interpretation vertex). This project has shown that in addition to designing more valid assessments, the growing body of knowledge about the psychology of achievement testing can be useful for structuring evaluations of technical quality. 相似文献

2.

Interim Assessments as a Strategy for Improvement: Easier Said Than Done

Paul Goren 《Peabody Journal of Education》2013,88(2):125-129

This essay provides an overview of the papers contained in this issue of the Peabody Journal of Education. In it, the author notes for policymakers especially issues and concerns that emerge from the use of formative assessments geared towards education improvement. While the intent of such assessments is lead to improved overall instruction and improved outcomes, he stresses that this is not a passive act that rests solely on testing children, providing their teachers and school leaders with data, and then hoping improvement will occur. 相似文献

3.

Technical Quality Criteria for Evaluating District Assessment Portfolios Used in the Nebraska STARS

Barbara S. Plake James C. Impara Chad W. Buckendahl 《Educational Measurement》2004,23(2):12-16

Nebraska districts use different strategies for measuring student performance on the state's content standards. District assessments differ in type and technical quality. Six quality criteria were endorsed by the state. These criteria cover content and curricular validity, fairness, and appropriateness of score interpretations. District assessment portfolios document how well assessments meet these criteria. Districts receive ratings on how well their assessments meet each of the quality criteria and are given a rating from Unacceptable to Exemplary. This article presents these technical quality criteria and explains how they are (a) individually rated and (b) combined for the district's overall quality rating. 相似文献

4.

Moving Toward a Comprehensive Assessment System: A Framework for Considering Interim Assessments 总被引：2，自引：0，他引：2

Marianne Perie Scott Marion Brian Gong 《Educational Measurement》2009,28(3):5-13

Local assessment systems are being marketed as formative, benchmark, predictive, and a host of other terms. Many so-called formative assessments are not at all similar to the types of assessments and strategies studied by Black and Wiliam (1998) but instead are interim assessments. In this article, we clarify the definition and uses of interim assessments and argue that they can be an important piece of a comprehensive assessment system that includes formative, interim, and summative assessments. Interim assessments are given on a larger scale than formative assessments, have less flexibility, and are aggregated to the school or district level to help inform policy. Interim assessments are driven by their purpose, which fall into the categories of instructional, evaluative, or predictive. Our intent is to provide a specific definition for these "interim assessments" and to develop a framework that district and state leaders can use to evaluate these systems for purchase or development. The discussion lays out some concerns with the current state of these assessments as well as hopes for future directions and suggestions for further research. 相似文献

5.

A Strategy for Evaluating District Developed Assessments for State Accountability

Chad W. Buckendahl Barbara S. Plake James C. Impara 《Educational Measurement》2004,23(2):17-25

As part of Nebraska's assessment and accountability system, districts' local assessment systems are evaluated for their psychometric quality. This article provides an overview of a two-stage evaluation strategy, discusses how it was applied in Nebraska, and presents results from the first three years of the evaluation process. Benefits of the method include an emphasis on formative evaluation and promotion of improved assessment quality at the local level. A limitation of the model is the inability to make refined comparisons of student performance across districts on the assessments. Results from the first three years suggest that greater specificity in the review criteria and additional reviewer calibration activities are needed. 相似文献

6.

An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

Tie Liang Craig S. Wells Ronald K. Hambleton 《Journal of Educational Measurement》2014,51(1):1-17

As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting model misfit. The purposes of this study were to extend the use of RISE to more general and comprehensive applications by manipulating a variety of factors (e.g., test length, sample size, IRT models, ability distribution). The results from the simulation study demonstrated that RISE outperformed G² and S‐X² in that it controlled Type I error rates and provided adequate power under the studied conditions. In the empirical study, RISE detected reasonable numbers of misfitting items compared to G² and S‐X², and RISE gave a much clearer picture of the location and magnitude of misfit for each misfitting item. In addition, there was no practical consequence to classification before and after replacement of misfitting items detected by three fit statistics. 相似文献

7.

A Framework for Conceptualizing and Evaluating the Validity of Instructionally Relevant Assessments

James W. Pellegrino Louis V. DiBello Susan R. Goldman 《教育心理学家》2016,51(1):59-81

Assessments that function close to classroom teaching and learning can play a powerful role in fostering academic achievement. Unfortunately, however, relatively little attention has been given to discussion of the design and validation of such assessments. The present article presents a framework for conceptualizing and organizing the multiple components of validity applicable to assessments intended for use in the classroom to support ongoing processes of teaching and learning. The conceptual framework builds on existing validity concepts and focuses attention on three components: cognitive validity, instructional validity, and inferential validity. The goal in presenting the framework is to clarify the concept of validity, including key components of the interpretive argument, while considering the types and forms of evidence needed to construct a validity argument for classroom assessments. The framework's utility is illustrated by presenting an application to the analysis of the validity of assessments embedded within an elementary mathematics curriculum. 相似文献

8.

Commentary: Evaluating the Validity of Formative and Interim Assessment 总被引：1，自引：0，他引：1

Lorrie A. Shepard 《Educational Measurement》2009,28(3):32-37

In many school districts, the pressure to raise test scores has created overnight celebrity status for formative assessment. Its powers to raise student achievement have been touted, however, without attending to the research on which these claims were based. Sociocultural learning theory provides theoretical grounding for understanding how formative assessment works to increase student learning. The articles in this special issue bring us back to underlying first principles by offering separate validity frameworks for evaluating formative assessment (Nichols, Meyers, & Burling) and newly-invented interim assessments (Perie, Marion, & Gong). The article by Heritage, Kim, Vendlinski, and Herman then offers the most important insight of all; that is, formative assessment is of little use if teachers don't know what to do when students are unable to grasp an important concept. While it is true that validity investigations are needed, I argue that the validity research that will tell us the most—about how formative assessment can be used to improve student learning—must be embedded in rich curriculum and must at the same time attempt to foster instructional practices consistent with learning research. 相似文献

9.

The Technical Quality of Performance Assessments: Standard Errors of Percents of Pupils Reaching Standards

Wendy M. Yen 《Educational Measurement》1997,16(3):5-15

How accurate is the reported percent of students’reaching standards at the school level? How are standard errors for these statistics computed? How do estimates vary with choice of generalizability model? 相似文献

10.

Evaluating the Validity of Assessments: The Consequences of Use 总被引：2，自引：0，他引：2

Robert L. Linn 《Educational Measurement》1997,16(2):14-16

相似文献

11.

An Experimental Approach to Evaluating the Viability of Potential Applicants for Assistant Principal Vacancies

Paul A. Winter Philip R. Partenheimer Joseph M. Petrosko 《Journal of Personnel Evaluation in Education》2003,17(4):299-315

Despite the purported national shortage of qualified applicants for administrator vacancies, there is little empirical research regarding assistant principal recruitment. This study involved an experiment to evaluate the viability of teachers as applicants for assistant principal vacancies. ANOVA results indicated administrator certification program status (admitted, not admitted) and school level (elementary, middle school, high school) explained 28% of the variance in job ratings. Teachers who were enrolled in administrator certification programs rated the job higher than did teachers who were not enrolled in administrator preparation programs. Middle school teachers rated the job higher than did high school teachers. Implications for practice and future research are discussed. 相似文献

12.

教育技术装备质量论略

王祥明后有为《中国教育技术装备》2006,(1):34-37

本文对教育技术装备质量这一概念的内涵进行了全方位、系统的分析和界定，并对此概念所包含的不同层次，不同因素分别进行了详尽的论述，从而指出教育技术装备部门要想保证技术装备的质量，必须从技术质量、产品质量、采购质量、监督质量入手，全面发挥好自己的职能。相似文献

13.

A Framework for Evaluating and Planning Assessments Intended to Improve Student Achievement

Paul D. Nichols Jason L. Meyers Kelly S. Burling 《Educational Measurement》2009,28(3):14-23

Assessments labeled as formative have been offered as a means to improve student achievement. But labels can be a powerful way to miscommunicate. For an assessment use to be appropriately labeled "formative," both empirical evidence and reasoned arguments must be offered to support the claim that improvements in student achievement can be linked to the use of assessment information. Our goal in this article is to support the construction of such an argument by offering a framework within which to consider evidence-based claims that assessment information can be used to improve student achievement. We describe this framework and then illustrate its use with an example of one-on-one tutoring. Finally, we explore the framework's implications for understanding when the use of assessment information is likely to improve student achievement and for advising test developers on how to develop assessments that are intended to offer information that can be used to improve student achievement. 相似文献

14.

城市土壤的环境质量评价

费杨陈静王颖《南通职业大学学报》2012,26(2):69-74

就8种重金属元素在城市不同区域的污染浓度数据进行建模分析。应用单因子污染指数法、尼梅罗综合污染指数法及地质累积指数法,从不同角度对城市土壤重金属污染程度作出评价;采用因子分析法研究多个变量的相关性,按照成因上的联系进行归类,进而判别城市重金属的污染来源及其分布规律;针对重金属元素在土壤中的扩散特征,采用抛物型的偏微分方程模型,通过反演的方法拟合出污染源的位置;最后总结模型的优缺,最,并提出进一步收集信息的设想。相似文献

15.

Evaluating the Psychometric Qualities of the National Board for Professional Teaching Standards' Assessments: A Methodological Accounting 总被引：1，自引：0，他引：1

Jaeger Richard M. 《Educational Assessment, Evaluation and Accountability》1998,12(2):189-210

相似文献

16.

An Investigation of Scoring Methods for Mathematics Performance-Based Assessments

《Educational Assessment》2013,18(3):195-224

Three mathematics scoring methods are being used or explored in large-scale assessment programs: item-by-item scoring, holistic scoring, and "trait" scoring. This study investigated all 3 methods of scoring on 3 mathematics performance-based assessments. Mathematics assessment tasks were selected from a pool of pilot tasks because they could be scored using all 3 methods. Results of the study suggest that holistic scoring and item-by-item scoring methods provide similar information; however, trait score for conceptual understanding and mathematics communication tapped into different aspects of student performance. Implications for the validity of scoring methods now in use for performance-based mathematics assessments are discussed. 相似文献

17.

An Approach to Evaluating the Missing Data Assumptions of the Chain and Post-stratification Equating Methods for the NEAT Design

Paul W. Holland Sandip Sinharay Alina A. von Davier Ning Han 《Journal of Educational Measurement》2008,45(1):17-43

Two important types of observed score equating (OSE) methods for the non-equivalent groups with Anchor Test (NEAT) design are chain equating (CE) and post-stratification equating (PSE). CE and PSE reflect two distinctly different ways of using the information provided by the anchor test for computing OSE functions. Both types of methods include linear and nonlinear equating functions. In practical situations, it is known that the PSE and CE methods will give different results when the two groups of examinees differ on the anchor test. However, given that both types of methods are justified as OSE methods by making different assumptions about the missing data in the NEAT design, it is difficult to conclude which, if either, of the two is more correct in a particular situation. This study compares the predictions of the PSE and CE assumptions for the missing data using a special data set for which the usually missing data are available. Our results indicate that in an equating setting where the linking function is decidedly non-linear and CE and PSE ought to be different, both sets of predictions are quite similar but those for CE are slightly more accurate . 相似文献

18.

高等职业技术教育实践教学的探讨

徐绍坤《蒙自师范高等专科学校学报》2000,2(4):41-46

随着科学技术的迅猛发展和国家经济体制改革的不断深入 ,社会对人才的需求更趋于综合化、技能化 .新兴的高等职业技术教育顺应了这种社会的需求 ,以培养生产第一线的实用型、技能型人才为目标 ,与传统的高等教育相比 ,具有鲜明的职业和岗位特色 ,而突出办学特色的基点在于实践教学 .试从专业设施建设、双师型师资队伍建设、课程体系、教学方法等方面对高等职业技术教育如何开展实践教学作些探讨 . 相似文献

19.

Causal Inference and COVID: Contrasting Methods for Evaluating Pandemic Impacts Using State Assessments

Benjamin R. Shear 《Educational Measurement》2023,42(1):99-109

In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study pandemic impacts using state assessment data: measurement artifacts affecting the comparability of scores, secular trends, and changes in the tested population. The article compares three statistical approaches (the Fair Trend, baseline student growth percentiles, and multiple regression with demographic covariates) that can support more valid inferences about student learning during the pandemic and in other scenarios in which the tested population changes over time. All three approaches lead to similar inferences about statewide student performance but can lead to very different inferences about student subgroups. Results show that controlling statistically for prepandemic demographic differences can reverse the conclusions about groups most affected by the pandemic and decisions about prioritizing resources. 相似文献

20.

旅游翻译质量控制体系的伦理视角

刘彤钟平《赣南师范学院学报》2013,(5):87-90

旅游翻译质量关乎涉外旅游经济的发展。在目前中国尚无法律保障和约束旅游翻译质量的情况下,应该从职业伦理层面构建由委托人伦理、译者伦理、管理部门伦理及校核人伦理等职要素组成的旅游翻译质量保障体系。体系内各伦理要素相互制约、共同作用以保障旅游翻译的质量。相似文献