首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The Standards for Educational and Psychological Testing have evolved in the breadth and depth of coverage of issues in educational testing and measurement since their first publication in 1954. There were a number of substantive changes in the 1999 revision that addressed validity, fairness, accommodations, and compliance with the Standards. In addition, there was nearly a 50% increase in the number of standards contained in the last revision. The next revision of the Standards may be initiated in 2007 and there are remaining concerns about access and awareness by non-measurement professionals, compliance by test publishers and users, relevance in addressing mandates for accountability, and substantive areas of educational assessment. This review of major changes to the Standards and discussion of future topics is designed to inform the next revision.  相似文献   

2.
Background:?Validity theory has evolved significantly over the past 30 years in response to the increased use of assessments across scientific, social and educational settings. The overarching trajectory of this evolution reflects a shift from a purely quantitative, positivistic approach to a conception of validity reliant on the interpretation of multiple evidence sources integrated into validity arguments. Moreover, within contemporary validity, interpretation has been emphasised as a central process; however, despite this emphasis, there have been few explicit articulations of specific interpretive methodologies applicable to the practice of validation.

Purpose:?To link contemporary theoretical foundations in validity to practical methods and structures to help guide the collection and analysis of interpretive validity evidence. By building upon existing validity theory, this paper aims to provide greater clarity on the practice of validation and contribute toward the larger developing framework for the validation of educational assessments.

Source of evidence:?An interdisciplinary, integrative review of over 60 research articles and sources related to the theory and practice of educational validation and interpretive inquiry approaches. Sources include literature from the fields of educational assessment and more broadly social scientific research.

Main argument:?As assessments in education increasingly aim to measure complex constructs that are value-laden and socially dependant, validity theory must keep pace and evolve in ways that address the inherent complexities associated with contemporary educational assessment. Through this paper, I assert that a greater understanding of interpretive methodologies represents one of the most promising areas for development of validation theory and practice. Specifically, I argue that dialectic, hermeneutic and transgressive forms of inquiry can be integrated within current argument-based structures for the collection, analysis and representation of validity evidence in several useful ways.

Conclusions:?Interpretive inquiry processes, namely dialectic, hermeneutic and transgressive forms of interpretation, serve to expand validation practice to include diverse evidences for the generation of multiple-perspective validity arguments. The paper concludes with specific implications for future research and practice within the field of interpretive validity theory.  相似文献   

3.
高校招生中使用发展性评价结果是必要的.但是,对发展性评价的使用还存在着其评价结果缺乏可比性,效度、信度不高,在现有高校招生条件下难以操作等问题.  相似文献   

4.
本文简单回顾心理测量学中效度概念发展的三个阶段,并着重分析了效度概念在现阶段的新发展——构想效度理论。可以看出,效度概念是一个不断发展的动态过程,随着研究内容的丰富化,研究方法也日益多样化。现阶段的构想效度已经足以容纳所有可能为分数的解释提供支持的证据。对效度概念的完整认识,有助于我们从一个更为宽阔的角度去认识测验的效力和实质。  相似文献   

5.
语言测试中充满了对立统一的矛盾变量,而妥协则是解决矛盾冲突的有效手段。从英语专业测试的视角,针对当前测试中重知识轻能力以及测试的负面反拨效应等问题,提出了向效度妥协的原则。在阐述向效度妥协的相关理据的基础上,就实践该原则的有效途径进行了探讨,旨在为英语专业测试的改革提供理论支持和可行性建议。  相似文献   

6.
完形填空是英语测试中常见的测试形式之一.它主要用于测试考生对语篇不同层次的阅读理解和完形思维能力。这一测试形式与测试的信度和效度密切相关,因此,在命题时应遵循三个原则:选择材料的合理性原则、设计试题的科学性原则和确立测试点的全面性原则。  相似文献   

7.
Students with the most significant cognitive disabilities (SCD) are the 1% of the total student population who have a disability or multiple disabilities that significantly impact intellectual functioning and adaptive behaviors and who require individualized instruction and substantial supports. Historically, these students have received little instruction in science and the science assessments they have participated in have not included age‐appropriate science content. Guided by a theory of action for a new assessment system, an eight‐state consortium developed multidimensional alternate content standards and alternate assessments in science for students in three grade bands (3–5, 6–8, 9–12) that are linked to the Next Generation Science Standards (NGSS Lead States, 2013 ) and A Framework for K‐12 Science Education (Framework; National Research Council, 2012 ). The great variability within the population of students with SCD necessitates variability in the assessment content, which creates inherent challenges in establishing technical quality. To address this issue, a primary feature of this assessment system is the use of hypothetical cognitive models to provide a structure for variability in assessed content. System features and subsequent validity studies were guided by a theory of action that explains how the proposed claims about score interpretation and use depend on specific assumptions about the assessment, as well as precursors to the assessment. This paper describes evidence for the main claim that test scores represent what students know and can do. We present validity evidence for the assumptions about the assessment and its precursors, related to this main claim. The assessment was administered to over 21,000 students in eight states in 2015–2016. We present selected evidence from system components, procedural evidence, and validity studies. We evaluate the validity argument and demonstrate how it supports the claim about score interpretation and use.  相似文献   

8.
英语口语考试的信度和效度受口试形式、评分标准和考官素质等多方面因素的影响。提高英语口试的效度和信度,需坚持英语口试形式与内容的统一,设计出科学、客观并具有可操作性的评分标准。高信度与效度的英语口语测试对教学具有积极的反拨作用。  相似文献   

9.
效度问题是人类测量活动中最重要也是最困难的一个问题。本文首先从效度的概念与分类、效度的理论公式以及效度的评估方法等三个方面讨论了经典效度理论存在的弊端,然后针对这三个方面提出了相应的改进意见,在一定程度上实现了效度理论的重建。  相似文献   

10.
Beginning with a reference to living in a time of both uncertainty and opportunity, this article presents a discussion of key areas where shared understanding is needed if we are to successfully realize the design and use of high quality, valid assessments of science. The key areas discussed are: (1) assessment purpose and use, (2) the nature of assessment and the importance of research on learning, (3) assessment design processes, (4) validity arguments, (5) measurement and statistical inference, (6) affordances of technology, and (7) systems of assessment. After introducing each vital area, the article discusses how each of the five articles in the special issue is connected to the areas. Concluding comments emphasize the reminder that despite the large amount of work to be done, we are well positioned to realize the high quality, valid science education assessments that we need for K‐16 science education. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 831–841, 2012  相似文献   

11.
12.
In this paper,the author discusses reading testing and its validity,and the requirements of reading to test takers as well as the principles which determine the validity of reading testing. By analyzing two GET - 4 model test( reading section), he shows how to make a test of great validity so as to ensure the accuracy and objectiveness of a test.  相似文献   

13.
This literature review explored whether dynamic assessment procedures in psycho-educational practice might bridge the well-known gap between diagnosis and intervention. Due to a learning phase included in the testing procedure, qualitative information about the child’s learning needs can be revealed by means of dynamic assessment. The question is, however, what the consequential validity, i.e. the extent to which assessment influences instructional and learning processes, of dynamic assessment procedures really is. The review of 31 articles that met the inclusion criteria showed that proximal consequential validity of dynamic assessment is warranted, but distal consequential validity is warranted to a lesser extent (e.g. some guidelines for practice). Furthermore, it can be noticed that motivational aspects never played an explicit role during learning phases. In order to design student-tailored interventions following dynamic assessment, there is a need for more explicitness of learning phases and types of feedback in the development of these instruments.  相似文献   

14.
The Devereux Early Childhood Assessment (DECA) is a social-emotional assessment widely used by early childhood educational programs to inform early identification and intervention efforts. However, its construct validity is not well-established in independent samples of children from low-income backgrounds. We examined the construct validity of the teacher report of the DECA using a series of confirmatory factor analyses, exploratory factor analyses, and the Rasch partial credit model in a large sample of culturally and linguistically diverse Head Start children (N = 5,197). Findings provided some evidence for consistency in the factor structure of the three Protective Factors subscales (Initiative, Self-Control, and Attachment); however, the factor structure of the Behavioral Concerns subscale was not replicated in our sample and demonstrated poor fit to these data. Findings suggested that the 10 items of the published Behavioral Concerns subscale did not comprise a unidimensional construct, but rather, were better represented by two factors (externalizing and internalizing behavior). The use of the total Behavioral Concerns score as a screening tool to identify emotional and behavioral problems in diverse samples of preschool children from low-income backgrounds was not supported, especially for internalizing behavior. Implications for the consequential validity of the DECA for use as a screening tool in early childhood programs serving diverse populations of children and directions for future research are discussed.  相似文献   

15.
Although there have been numerous studies investigating the predictive validity of early assessment, observed predictive validity coefficients across studies are not stable. A validity generalization study was conducted in order to answer the question of whether the relationship between early assessment of children and later achievement is generalizable or situation-specific. This study examined 716 predictive correlation coefficients from 44 studies using Hierarchical Linear Modeling (HLM). The findings of this study revealed that predictive validity of early assessment is not generalizable. Additional analyses indicated that predictive validity differ across assessments as a function of test type, specific construct being assessed, length of prediction, and administration procedures. The most impressive finding in this study was the variability of effect sizes across different test administration types. In particular, tests that were scored through ratings were found to be most effective. These findings suggest that instead of addressing a broad predictive validity between a test and a criterion measure, it is necessary to understand early assessment procedures as a whole system by including considerations of various variables related to testing conditions.  相似文献   

16.
Current educational policies rely on educational assessments. However, the technical aspects of assessments are often unknown to policy makers, which is dangerous because sound assessment policy requires knowledge of the strengths and limitations of educational tests. In this article, we discuss the importance of informing policy makers of important psychometric issues that should be considered whenever tests are proposed for specific purposes. We discuss the types of information that are important to communicate to policy makers, how to best convey this information in a manner in which it can be understood, and how to be seen as a valuable source of information to education policy makers. We end with some specific steps organizations such as NCME can take to inform policy makers and advocate for valid educational assessment policies.  相似文献   

17.
语言测试研究是应用语言学的一个分支,信度和效度是语言测试领域中的两个重要概念.信度指的是考试结果的可靠性;效度指的是考试达到预定目的的程度.本文介绍了信度和效度的定义、测量方法、影响因素,并指出了语言测试中二者的相互关系是既相互依存,又相互排斥的关系.  相似文献   

18.
Educational tests are standardized so that all examinees are tested on the same material, under the same testing conditions, and with the same scoring protocols. This uniformity is designed to provide a level “playing field” for all examinees so that the test is “the same” for everyone. Thus, standardization is designed to promote fairness in testing. In practice, the material tested, the conditions under which a test is administered, and the scoring processes, are often too rigid to provide the intended level playing field. For example, standardized testing conditions may interact with personal characteristics of examinees that affect test performance, but are not construct-relevant. Thus, more flexibility in standardization is needed to account for the diversity of experiences, talents, and handicaps of the incredibly heterogeneous populations of examinees we currently assess. Traditional standardization procedures grew out of experimental psychology and psychophysics laboratories where keeping all conditions constant was crucial. Today, accounting for and measuring what is not constant across examinees is crucial to valid construct interpretations. To meet this need I introduce the concept of understandardization, which refers to ensuring sufficient flexibility in standardized testing conditions to yield the most accurate measurement of proficiency for each examinee.  相似文献   

19.
大学新生心理危机及其教育对策   总被引:1,自引:0,他引:1  
列举大学新生心理危机的四种表现,并分析其产生的主、客观因素,同时提出了调适心理危机的五种教育对策。  相似文献   

20.
The focus of this article is to draw attention to the presence and importance of travelling ideas, knowledge, and practices in Danish history of educational testing. The article introduces and employs a spatial methodological approach in relation to the connections between the international testing community and the emerging Danish practice of intelligence testing in the interwar years. The article represents a contribution to an investigation of the social and cultural exchange of educational ideas between the Anglo-Saxon world and Scandinavia, in general, and Denmark in particular. Moreover, the article argues for the positive gains of drawing on a spatial frame of interpretation when dealing with national educational history.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号