首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article presents findings from two projects designed to improve evaluations of technical quality of alternate assessments for students with the most significant cognitive disabilities. We argue that assessment technical documents should allow for the evaluation of the construct validity of the alternate assessments following the traditions of Cronbach (1971) , Messick (1989, 1995) , Linn, Baker, and Dunbar (1991) , and Shepard (1993) . The projects used the work of Knowing What Students Know ( Pellegrino, Chudowsky, & Glaser, 2001 ) to structure and focus the collection and evaluation of assessment information. The heuristic of the assessment triangle ( Pellegrino et al., 2001 ) was particularly useful in emphasizing that the validity evaluation needs to consider the logical connections among the characteristics of the students tested and how they develop domain proficiency (the cognition vertex), the nature of the assessment (the observation vertex), and the ways in which the assessment results are interpreted (the interpretation vertex). This project has shown that in addition to designing more valid assessments, the growing body of knowledge about the psychology of achievement testing can be useful for structuring evaluations of technical quality.  相似文献   

2.
This essay provides an overview of the papers contained in this issue of the Peabody Journal of Education. In it, the author notes for policymakers especially issues and concerns that emerge from the use of formative assessments geared towards education improvement. While the intent of such assessments is lead to improved overall instruction and improved outcomes, he stresses that this is not a passive act that rests solely on testing children, providing their teachers and school leaders with data, and then hoping improvement will occur.  相似文献   

3.
Nebraska districts use different strategies for measuring student performance on the state's content standards. District assessments differ in type and technical quality. Six quality criteria were endorsed by the state. These criteria cover content and curricular validity, fairness, and appropriateness of score interpretations. District assessment portfolios document how well assessments meet these criteria. Districts receive ratings on how well their assessments meet each of the quality criteria and are given a rating from Unacceptable to Exemplary. This article presents these technical quality criteria and explains how they are (a) individually rated and (b) combined for the district's overall quality rating.  相似文献   

4.
Local assessment systems are being marketed as formative, benchmark, predictive, and a host of other terms. Many so-called formative assessments are not at all similar to the types of assessments and strategies studied by   Black and Wiliam (1998)   but instead are interim assessments. In this article, we clarify the definition and uses of interim assessments and argue that they can be an important piece of a comprehensive assessment system that includes formative, interim, and summative assessments. Interim assessments are given on a larger scale than formative assessments, have less flexibility, and are aggregated to the school or district level to help inform policy. Interim assessments are driven by their purpose, which fall into the categories of instructional, evaluative, or predictive. Our intent is to provide a specific definition for these "interim assessments" and to develop a framework that district and state leaders can use to evaluate these systems for purchase or development. The discussion lays out some concerns with the current state of these assessments as well as hopes for future directions and suggestions for further research.  相似文献   

5.
As part of Nebraska's assessment and accountability system, districts' local assessment systems are evaluated for their psychometric quality. This article provides an overview of a two-stage evaluation strategy, discusses how it was applied in Nebraska, and presents results from the first three years of the evaluation process. Benefits of the method include an emphasis on formative evaluation and promotion of improved assessment quality at the local level. A limitation of the model is the inability to make refined comparisons of student performance across districts on the assessments. Results from the first three years suggest that greater specificity in the review criteria and additional reviewer calibration activities are needed.  相似文献   

6.
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting model misfit. The purposes of this study were to extend the use of RISE to more general and comprehensive applications by manipulating a variety of factors (e.g., test length, sample size, IRT models, ability distribution). The results from the simulation study demonstrated that RISE outperformed G2 and S‐X2 in that it controlled Type I error rates and provided adequate power under the studied conditions. In the empirical study, RISE detected reasonable numbers of misfitting items compared to G2 and S‐X2, and RISE gave a much clearer picture of the location and magnitude of misfit for each misfitting item. In addition, there was no practical consequence to classification before and after replacement of misfitting items detected by three fit statistics.  相似文献   

7.
Assessments that function close to classroom teaching and learning can play a powerful role in fostering academic achievement. Unfortunately, however, relatively little attention has been given to discussion of the design and validation of such assessments. The present article presents a framework for conceptualizing and organizing the multiple components of validity applicable to assessments intended for use in the classroom to support ongoing processes of teaching and learning. The conceptual framework builds on existing validity concepts and focuses attention on three components: cognitive validity, instructional validity, and inferential validity. The goal in presenting the framework is to clarify the concept of validity, including key components of the interpretive argument, while considering the types and forms of evidence needed to construct a validity argument for classroom assessments. The framework's utility is illustrated by presenting an application to the analysis of the validity of assessments embedded within an elementary mathematics curriculum.  相似文献   

8.
Commentary: Evaluating the Validity of Formative and Interim Assessment   总被引:1,自引:0,他引:1  
In many school districts, the pressure to raise test scores has created overnight celebrity status for formative assessment. Its powers to raise student achievement have been touted, however, without attending to the research on which these claims were based. Sociocultural learning theory provides theoretical grounding for understanding how formative assessment works to increase student learning. The articles in this special issue bring us back to underlying first principles by offering separate validity frameworks for evaluating formative assessment (Nichols, Meyers, & Burling) and newly-invented interim assessments (Perie, Marion, & Gong). The article by Heritage, Kim, Vendlinski, and Herman then offers the most important insight of all; that is, formative assessment is of little use if teachers don't know what to do when students are unable to grasp an important concept. While it is true that validity investigations are needed, I argue that the validity research that will tell us the most—about how formative assessment can be used to improve student learning—must be embedded in rich curriculum and must at the same time attempt to foster instructional practices consistent with learning research.  相似文献   

9.
How accurate is the reported percent of students’reaching standards at the school level? How are standard errors for these statistics computed? How do estimates vary with choice of generalizability model?  相似文献   

10.
11.
Despite the purported national shortage of qualified applicants for administrator vacancies, there is little empirical research regarding assistant principal recruitment. This study involved an experiment to evaluate the viability of teachers as applicants for assistant principal vacancies. ANOVA results indicated administrator certification program status (admitted, not admitted) and school level (elementary, middle school, high school) explained 28% of the variance in job ratings. Teachers who were enrolled in administrator certification programs rated the job higher than did teachers who were not enrolled in administrator preparation programs. Middle school teachers rated the job higher than did high school teachers. Implications for practice and future research are discussed.  相似文献   

12.
本文对教育技术装备质量这一概念的内涵进行了全方位、系统的分析和界定,并对此概念所包含的不同层次,不同因素分别进行了详尽的论述,从而指出教育技术装备部门要想保证技术装备的质量,必须从技术质量、产品质量、采购质量、监督质量入手,全面发挥好自己的职能。  相似文献   

13.
Assessments labeled as formative have been offered as a means to improve student achievement. But labels can be a powerful way to miscommunicate. For an assessment use to be appropriately labeled "formative," both empirical evidence and reasoned arguments must be offered to support the claim that improvements in student achievement can be linked to the use of assessment information. Our goal in this article is to support the construction of such an argument by offering a framework within which to consider evidence-based claims that assessment information can be used to improve student achievement. We describe this framework and then illustrate its use with an example of one-on-one tutoring. Finally, we explore the framework's implications for understanding when the use of assessment information is likely to improve student achievement and for advising test developers on how to develop assessments that are intended to offer information that can be used to improve student achievement.  相似文献   

14.
就8种重金属元素在城市不同区域的污染浓度数据进行建模分析。应用单因子污染指数法、尼梅罗综合污染指数法及地质累积指数法,从不同角度对城市土壤重金属污染程度作出评价;采用因子分析法研究多个变量的相关性,按照成因上的联系进行归类,进而判别城市重金属的污染来源及其分布规律;针对重金属元素在土壤中的扩散特征,采用抛物型的偏微分方程模型,通过反演的方法拟合出污染源的位置;最后总结模型的优缺,最,并提出进一步收集信息的设想。  相似文献   

15.
16.
《Educational Assessment》2013,18(3):195-224
Three mathematics scoring methods are being used or explored in large-scale assessment programs: item-by-item scoring, holistic scoring, and "trait" scoring. This study investigated all 3 methods of scoring on 3 mathematics performance-based assessments. Mathematics assessment tasks were selected from a pool of pilot tasks because they could be scored using all 3 methods. Results of the study suggest that holistic scoring and item-by-item scoring methods provide similar information; however, trait score for conceptual understanding and mathematics communication tapped into different aspects of student performance. Implications for the validity of scoring methods now in use for performance-based mathematics assessments are discussed.  相似文献   

17.
Two important types of observed score equating (OSE) methods for the non-equivalent groups with Anchor Test (NEAT) design are chain equating (CE) and post-stratification equating (PSE). CE and PSE reflect two distinctly different ways of using the information provided by the anchor test for computing OSE functions. Both types of methods include linear and nonlinear equating functions. In practical situations, it is known that the PSE and CE methods will give different results when the two groups of examinees differ on the anchor test. However, given that both types of methods are justified as OSE methods by making different assumptions about the missing data in the NEAT design, it is difficult to conclude which, if either, of the two is more correct in a particular situation. This study compares the predictions of the PSE and CE assumptions for the missing data using a special data set for which the usually missing data are available. Our results indicate that in an equating setting where the linking function is decidedly non-linear and CE and PSE ought to be different, both sets of predictions are quite similar but those for CE are slightly more accurate .  相似文献   

18.
随着科学技术的迅猛发展和国家经济体制改革的不断深入 ,社会对人才的需求更趋于综合化、技能化 .新兴的高等职业技术教育顺应了这种社会的需求 ,以培养生产第一线的实用型、技能型人才为目标 ,与传统的高等教育相比 ,具有鲜明的职业和岗位特色 ,而突出办学特色的基点在于实践教学 .试从专业设施建设、双师型师资队伍建设、课程体系、教学方法等方面对高等职业技术教育如何开展实践教学作些探讨 .  相似文献   

19.
In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study pandemic impacts using state assessment data: measurement artifacts affecting the comparability of scores, secular trends, and changes in the tested population. The article compares three statistical approaches (the Fair Trend, baseline student growth percentiles, and multiple regression with demographic covariates) that can support more valid inferences about student learning during the pandemic and in other scenarios in which the tested population changes over time. All three approaches lead to similar inferences about statewide student performance but can lead to very different inferences about student subgroups. Results show that controlling statistically for prepandemic demographic differences can reverse the conclusions about groups most affected by the pandemic and decisions about prioritizing resources.  相似文献   

20.
旅游翻译质量关乎涉外旅游经济的发展。在目前中国尚无法律保障和约束旅游翻译质量的情况下,应该从职业伦理层面构建由委托人伦理、译者伦理、管理部门伦理及校核人伦理等职要素组成的旅游翻译质量保障体系。体系内各伦理要素相互制约、共同作用以保障旅游翻译的质量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号