首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The Standards for Educational and Psychological Testing identify several strands of validity evidence that may be needed as support for particular interpretations and uses of assessments. Yet assessment validation often does not seem guided by these Standards, with validations lacking a particular strand even when it appears relevant to an assessment. Consequently, the degree to which validity evidence supports the proposed interpretation and use of the assessment may be compromised. Guided by the Standards, this article presents an independent validation of OECD's PISA assessment of mathematical self-efficacy (MSE) as an instructive example of this issue. OECD identifies MSE as one of a number of “factors” explaining student performance in mathematics, thereby serving the “policy orientation” of PISA. However, this independent validation identifies significant shortcomings in the strands of validity evidence available to support this interpretation and use of the assessment. The article therefore demonstrates how the Standards can guide the planning of a validation to ensure it generates the validity evidence relevant to an interpretive argument, particularly for an international large-scale assessment such as PISA. The implication is that assessment validation could yet benefit from the Standards as what Zumbo calls “a global force for testing”.  相似文献   

2.
Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high‐stakes testing arena rely on classical test theory (CTT) methods. However, advances in item response theory software have made the application of these techniques much more accessible to classroom instructors. The purpose of this research is to analyze a common medical school anatomy examination using both the traditional CTT scoring method and a Rasch measurement scoring method to determine which technique provides more robust findings, and which set of psychometric indicators will be more meaningful and useful for anatomists looking to improve the psychometric quality and functioning of their examinations. Results produced by the more robust and meaningful methodology will undergo a rigorous psychometric validation process to evaluate construct validity. Implications of these techniques and additional possibilities for advanced applications are also discussed. Anat Sci Educ 7: 450–460. © 2014 American Association of Anatomists.  相似文献   

3.
The AERA, APA, NCME Standards define validity as ‘the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests’. A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it is one idea, not a sequence of steps. Just as test design is framed by a particular context of use, so too must validation research focus on the adequacy of tests for specific purposes. The consensus definition also carries forward major reforms in validity theory begun in the 1970s that rejected separate types of validity evidence for different types of tests, e.g. content validity for achievement tests and predictive correlations for employment tests. When the current definition refers to both ‘evidence and theory’ the Standards are requiring not just that a test be well designed based on theory but that evidence be collected to verify that the test device is working as intended. Having taught policy-makers, citizens, and the courts to use the word validity, especially in high-stakes applications, we cannot after the fact substitute a more limited, technical definition of validity. An official definition provides clarity even for those who disagree, because it serves as a touchstone and obliges them to acknowledge when they are departing from it.  相似文献   

4.
1985年《教育与心理测验标准》(第5版)出版之前,效度研究的核心概念是"效标(criterion)",效度研究被视为一种用"效标"对测验的效度进行证明(verify)、对测验分数做出有效(valid)解释的过程。1985年以后,效度研究的核心概念是"证据(evidence)",效度研究被视为一种通过积累证据对测验的效度进行支持(support)、对测验分数做出合理(reasonable)解释的过程。关于效度的这种理解,突出体现在1999年出版的《教育与心理测验标准》(第6版)中。美国教育协会和美国国家教育测量学会共同组织编写的《教育测量》在业内被称为"教育测量领域的《圣经》"。2006年《教育测量》(第4版)出版以后,效度研究的核心概念演变为"理由(warrant)",效度研究被视为一种通过构造"理由系统"和"理由网络"对效度进行"论证(argument)"、对测验分数做出可接受的(plausible)解释的过程。本文结合笔者的考试实践,介绍了效度概念的新发展。  相似文献   

5.
The psychometric literature is replete with comprehensive discussions of test validity, test validation, and the characteristics of quality assessment programs. The most authoritative source for guidance regarding sound test development and evaluation practices is the Standards for Educational and Psychological Testing. However, the Standards are not legally binding. In this article, we review the way in which validity is conceptualized in the Standards and compare this conceptualization with validity evidence presented in specific court cases involving legal challenges to tests. Our review indicates that, in general, there is strong congruence between the Standards and how validity is viewed in the courts, and that testing agencies that conform to these guidelines are likely to withstand legal scrutiny. However, the courts have taken a more practical, less theoretical view on validity and tend to emphasize evidence based on test content and testing consequences.  相似文献   

6.
Measuring cognitive modifiability from the responsiveness of an individual's performance to intervention has long been viewed (e.g., Dearborne, 1921) as an alternative to traditional (static) ability measurement. Currently, dynamic testing, in which cues or instruction are presented with ability test items, is a popular method for assessing cognitive modifiability. Despite the long-standing interest, however, little data exists to support the validity of cognitive modifiability measures in any ability domain. Several special methodological difficulties have limited validity studies, including psychometric problems in measuring modifiability (i.e., as change), lack of appropriate validation criteria, and difficulty in linking modifiability to cognitive theory. In this article, relatively new developments for solving the validation problems are applied to measuring and validating spatial modifiability. Criterion-related validity for predicting learning in an applied knowledge domain, as well as construct validity, is supported.  相似文献   

7.
《Educational Assessment》2013,18(2):149-165
Professional measurement standards have evolved during the past 5 decades, creating a more unitary yet nebulous conception of validation. Concurrently, due to the increase of high-stakes testing in public schools, the courts have been forced to rule on the appropriateness of decisions emanating from tests. However, the courts often have failed to apply current validation theory in rendering decisions, preferring the convenience and clarity of earlier perspectives of validity. This rift between validity theory and judicial interpretation threatens to grow into a chasm as more complex views of validation prevail in the profession. Modem measurement practitioners stand astride this chasm in their efforts to implement test validation procedures that are cost effective, legally defensible, and consistent with state-of-the-art theory.  相似文献   

8.
A misconception exists that validity may refer only to the interpretation of test scores and not to the uses of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test scores remains essential today. However, test scores are not interpreted and then ignored; rather, their interpretations lead to actions. Thus, a modern definition of validity needs to describe the validation of test score interpretations as a necessary, but insufficient, step en route to validating the uses of test scores for their intended purposes. To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests. The current definition of validity stipulated in the 2014 version of the Standards for Educational and Psychological Testing properly describes validity in terms of both interpretations and uses, and provides a sufficient starting point for validation.  相似文献   

9.
Abstract

The authors present the development and validation of the Standards Performance Continuum (SPC), a measure for assessing teacher performance of the Standards for Effective Pedagogy. The authors created the SPC to serve as an instrument for research on the relationship between teachers' use of these standards and instructional effectiveness, as a guide for teacher professional development, and as a tool for education reform. The goal of this research was to construct a quantitative instrument that is practical, easily scored, and readily interpretable. The standards are discussed, and 3 studies provide evidence of interrater reliability, concurrent validity, and criterion-related validity supporting the validity of interpretations of data gathered with the SPC.  相似文献   

10.
11.
The Chinese Early Childhood Environment Rating Scale (trial) (CECERS) is a new instrument for measuring early childhood program quality in the Chinese socio-cultural contexts, based on substantial adaptation from the Early Childhood Environment Rating Scale-Revised Edition (ECERS-R). This paper describes the development and validation process of CECERS. Empirical data were collected from a stratified random sample 178 classrooms, from which a random sample of 1012 children was measured for child development outcomes. Guided by the framework of broad conceptualization of validity and validation as advocated by Messick (1989), evidence in a variety of forms is presented and discussed, including content validity considerations (e.g., measuring socially and culturally relevant domains), measurement reliability considerations (e.g., internal consistency reliability, inter-rater reliability), and measurement validity considerations (concurrent validity, criterion-related validity, internal structure based on exploratory factor analysis). The empirical findings for CECERS compare very favorably with the validation outcomes of ECERS-R. The body of evidence accumulated in the validation process supports the use and interpretation of CECERS scores as quality indicators of early childhood education program in the Chinese social and cultural contexts. Limitations and future directions are also discussed.  相似文献   

12.
The purpose of this study is to develop and validate a trait emotional intelligence (EI) measurement for Korean adults. This scale was developed because there is still a lack of EI measurements that consider the effects of culture on emotions. It was found that the scale has a three-factor structure, and this structure was confirmed in cross-sample validation. If the construct validity of the measurement is confirmed in further studies, scholars as well as practitioners will be able to use the instrument in designing and evaluating EI development programs. In addition, by using the scale within various cultural settings, the effects of culture on EI may be uncovered.  相似文献   

13.
Background:?Validity theory has evolved significantly over the past 30 years in response to the increased use of assessments across scientific, social and educational settings. The overarching trajectory of this evolution reflects a shift from a purely quantitative, positivistic approach to a conception of validity reliant on the interpretation of multiple evidence sources integrated into validity arguments. Moreover, within contemporary validity, interpretation has been emphasised as a central process; however, despite this emphasis, there have been few explicit articulations of specific interpretive methodologies applicable to the practice of validation.

Purpose:?To link contemporary theoretical foundations in validity to practical methods and structures to help guide the collection and analysis of interpretive validity evidence. By building upon existing validity theory, this paper aims to provide greater clarity on the practice of validation and contribute toward the larger developing framework for the validation of educational assessments.

Source of evidence:?An interdisciplinary, integrative review of over 60 research articles and sources related to the theory and practice of educational validation and interpretive inquiry approaches. Sources include literature from the fields of educational assessment and more broadly social scientific research.

Main argument:?As assessments in education increasingly aim to measure complex constructs that are value-laden and socially dependant, validity theory must keep pace and evolve in ways that address the inherent complexities associated with contemporary educational assessment. Through this paper, I assert that a greater understanding of interpretive methodologies represents one of the most promising areas for development of validation theory and practice. Specifically, I argue that dialectic, hermeneutic and transgressive forms of inquiry can be integrated within current argument-based structures for the collection, analysis and representation of validity evidence in several useful ways.

Conclusions:?Interpretive inquiry processes, namely dialectic, hermeneutic and transgressive forms of interpretation, serve to expand validation practice to include diverse evidences for the generation of multiple-perspective validity arguments. The paper concludes with specific implications for future research and practice within the field of interpretive validity theory.  相似文献   

14.
When reliability and validity were introduced as validation criteria for empirical research in the human sciences, quantitative research methods prevailed, and theory of science relied on neopositivism (Vienna Circle) or postpositivism (scientific realism). Within this worldview, notions of reliability and validity as criteria of scientific goodness were introduced. Reliability and validity were associated with the correspondence theory of truth, which is mostly ill-suited to the needs of qualitative research. For that reason, qualitative research must look for other kinds of validation criteria. The article elaborates the problems arising when the correspondence theory of truth is used as an ultimate criterion in evaluating qualitative research and proposes Heidegger's hermeneutical or alethetical idea of truth as a more suitable approach.  相似文献   

15.
The Standards for Educational and Psychological Testing have evolved in the breadth and depth of coverage of issues in educational testing and measurement since their first publication in 1954. There were a number of substantive changes in the 1999 revision that addressed validity, fairness, accommodations, and compliance with the Standards. In addition, there was nearly a 50% increase in the number of standards contained in the last revision. The next revision of the Standards may be initiated in 2007 and there are remaining concerns about access and awareness by non-measurement professionals, compliance by test publishers and users, relevance in addressing mandates for accountability, and substantive areas of educational assessment. This review of major changes to the Standards and discussion of future topics is designed to inform the next revision.  相似文献   

16.
The scoring process is critical in the validation of tests that rely on constructed responses. Documenting that readers carry out the scoring in ways consistent with the construct and measurement goals is an important aspect of score validity. In this article, rater cognition is approached as a source of support for a validity argument for scores based on constructed responses, whether such scores are to be used on their own or as the basis for other scoring processes, for example, automated scoring.  相似文献   

17.
The affordances of digital portfolios provide a space for students to construct themselves in new ways, to the extent that they are creating content in an entirely new genre—a genre where the placement of content; connections among content; and the ability to communicate via image, color, movement, and sound are as important to making meaning as the alphabetic. It should not be assumed that these new choices are boundless, though with a cursory glance, they seem to be. Often, the tools instructors and students choose play a large role in dictating how students will present themselves and their content; what seems like a limitless number of options is not actually so open, because students will move toward the options that are available within a tool. Therefore, careful evaluation of the tool affordances is needed to determine if best practices within the new genre are truly promoted.  相似文献   

18.
针对高校人才培养目标,阐述了实验室在培养新时期复合型人才中的地位和作用,探讨了加强实验室建设与管理的方法和途径,使实验室在培养新时期复合型人才、促进科技发展方面发挥更重要的作用.  相似文献   

19.
The Standards for Educational and Psychological Testing indicate that multiple sources of validity evidence should be used to support the interpretation of test scores. In the past decade, examinee response processes, as a source of validity evidence, have received increased attention. However, there have been relatively few methodological studies of the accuracy and consistency of examinee response processes as measured by verbal reports in the context of educational measurement. The objective of the current study was to investigate the accuracy and consistency of examinee response processes—as measured by verbal reports—as a function of varying interviewer and item variables in a think aloud interview within an educational measurement context. Results indicate that the accuracy of responses may be undermined when students perceive the interviewer to be an expert in the domain. Further, the consistency of response processes may be undermined when items that are too easy or difficult are used to elicit reports. The implications of these results for conducting think-aloud studies are explored.  相似文献   

20.
Drawing on experience between 2000 and 2007 in developing a validity argument for the high-stakes Test of English as a Foreign Language™ ( TOEFL ®) , this paper evaluates the differences between the argument-based approach to validity as presented by Kane (2006) and that described in the 1999 AERA/APA/NCME Standards for Educational and Psychological Testing . Based on an analysis of four points of comparison—framing the intended score interpretation, outlining the essential research, structuring research results into a validity argument, and challenging the validity argument—we conclude that an argument-based approach to validity introduces some new and useful concepts and practices .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号