首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
The psychometric literature is replete with comprehensive discussions of test validity, test validation, and the characteristics of quality assessment programs. The most authoritative source for guidance regarding sound test development and evaluation practices is the Standards for Educational and Psychological Testing. However, the Standards are not legally binding. In this article, we review the way in which validity is conceptualized in the Standards and compare this conceptualization with validity evidence presented in specific court cases involving legal challenges to tests. Our review indicates that, in general, there is strong congruence between the Standards and how validity is viewed in the courts, and that testing agencies that conform to these guidelines are likely to withstand legal scrutiny. However, the courts have taken a more practical, less theoretical view on validity and tend to emphasize evidence based on test content and testing consequences.  相似文献   

Current Concerns in Validity Theory   总被引:3,自引:0,他引:3  
We are at the end of the first century of work on models of educational and psychological measurement and into a new millennium. This certainly seems like an appropriate time for looking backward and looking forward in assessment. Furthermore, a new edition of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) has been published, and the previous editions of the Standards have served as benchmarks in the development of measurement theory.
This backward glance will be just that, a glance. After a brief historical review focusing mainly on construct validity, the current state of validity theory will be summarized, with an emphasis on the role of arguments in validation. Then how an argument-based approach might be applied will be examined in regards to two issues in validity theory: the distinction between performance-based and theory-based interpretations, and the role of consequences in validation.  相似文献   

The AERA, APA, NCME Standards define validity as ‘the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests’. A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it is one idea, not a sequence of steps. Just as test design is framed by a particular context of use, so too must validation research focus on the adequacy of tests for specific purposes. The consensus definition also carries forward major reforms in validity theory begun in the 1970s that rejected separate types of validity evidence for different types of tests, e.g. content validity for achievement tests and predictive correlations for employment tests. When the current definition refers to both ‘evidence and theory’ the Standards are requiring not just that a test be well designed based on theory but that evidence be collected to verify that the test device is working as intended. Having taught policy-makers, citizens, and the courts to use the word validity, especially in high-stakes applications, we cannot after the fact substitute a more limited, technical definition of validity. An official definition provides clarity even for those who disagree, because it serves as a touchstone and obliges them to acknowledge when they are departing from it.  相似文献   

Any examination that involves moderate to high stakes implications for examinees should be psychometrically sound and legally defensible. Currently, there are two broad and competing families of test theories that are used to score examination data. The majority of instructors outside the high‐stakes testing arena rely on classical test theory (CTT) methods. However, advances in item response theory software have made the application of these techniques much more accessible to classroom instructors. The purpose of this research is to analyze a common medical school anatomy examination using both the traditional CTT scoring method and a Rasch measurement scoring method to determine which technique provides more robust findings, and which set of psychometric indicators will be more meaningful and useful for anatomists looking to improve the psychometric quality and functioning of their examinations. Results produced by the more robust and meaningful methodology will undergo a rigorous psychometric validation process to evaluate construct validity. Implications of these techniques and additional possibilities for advanced applications are also discussed. Anat Sci Educ 7: 450–460. © 2014 American Association of Anatomists.  相似文献   

Advances in technology are stimulating the development of complex, computerized assessments. The prevailing rationales for developing computer-based assessments are improved measurement and increased efficiency. In the midst of this measurement revolution, test developers and evaluators must revisit the notion of validity. In this article, we discuss the potential positive and negative effects computer-based testing could have on validity, review the literature regarding validation perspectives in computer-based testing, and provide suggestions regarding how to evaluate the contributions of computer-based testing to more valid measurement practices. We conclude that computer-based testing shows great promise for enhancing validity, but at this juncture, it remains equivocal whether technological innovations in assessment have led to more valid measurement.  相似文献   

Advances in validity theory and alacrity in validation practice have suffered because the term validity has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test uses). This article provides a brief summary of current validity theory, explication of a critical flaw in the current conceptualisation of validity, and a framework that both accommodates and differentiates validation of test score inferences and justification of test use.  相似文献   

Measuring cognitive modifiability from the responsiveness of an individual's performance to intervention has long been viewed (e.g., Dearborne, 1921) as an alternative to traditional (static) ability measurement. Currently, dynamic testing, in which cues or instruction are presented with ability test items, is a popular method for assessing cognitive modifiability. Despite the long-standing interest, however, little data exists to support the validity of cognitive modifiability measures in any ability domain. Several special methodological difficulties have limited validity studies, including psychometric problems in measuring modifiability (i.e., as change), lack of appropriate validation criteria, and difficulty in linking modifiability to cognitive theory. In this article, relatively new developments for solving the validation problems are applied to measuring and validating spatial modifiability. Criterion-related validity for predicting learning in an applied knowledge domain, as well as construct validity, is supported.  相似文献   

Background:?Validity theory has evolved significantly over the past 30 years in response to the increased use of assessments across scientific, social and educational settings. The overarching trajectory of this evolution reflects a shift from a purely quantitative, positivistic approach to a conception of validity reliant on the interpretation of multiple evidence sources integrated into validity arguments. Moreover, within contemporary validity, interpretation has been emphasised as a central process; however, despite this emphasis, there have been few explicit articulations of specific interpretive methodologies applicable to the practice of validation.

Purpose:?To link contemporary theoretical foundations in validity to practical methods and structures to help guide the collection and analysis of interpretive validity evidence. By building upon existing validity theory, this paper aims to provide greater clarity on the practice of validation and contribute toward the larger developing framework for the validation of educational assessments.

Source of evidence:?An interdisciplinary, integrative review of over 60 research articles and sources related to the theory and practice of educational validation and interpretive inquiry approaches. Sources include literature from the fields of educational assessment and more broadly social scientific research.

Main argument:?As assessments in education increasingly aim to measure complex constructs that are value-laden and socially dependant, validity theory must keep pace and evolve in ways that address the inherent complexities associated with contemporary educational assessment. Through this paper, I assert that a greater understanding of interpretive methodologies represents one of the most promising areas for development of validation theory and practice. Specifically, I argue that dialectic, hermeneutic and transgressive forms of inquiry can be integrated within current argument-based structures for the collection, analysis and representation of validity evidence in several useful ways.

Conclusions:?Interpretive inquiry processes, namely dialectic, hermeneutic and transgressive forms of interpretation, serve to expand validation practice to include diverse evidences for the generation of multiple-perspective validity arguments. The paper concludes with specific implications for future research and practice within the field of interpretive validity theory.  相似文献   

The conventional focus of validity in educational measurement has been on intended interpretations and uses of test scores. Empirical studies of test use by teachers, administrators and policy-makers show that actual interpretations and uses of test scores in context are invariably shaped by local users’ questions, which frequently require attention to multiple sources of evidence about students’ learning and the factors that shape it, and depend on local capacity to use such information well. This requires a more complex theory of validity that can shift focus as needed from the intended interpretations and uses of test scores that guide test developers to local capacity to support the actual interpretations, decisions and actions that routinely serve local users’ purposes. I draw on the growing empirical literature on data use to illustrate the need for an expanded theory of validity, point to theoretical resources that might guide such an expansion, and suggest a research agenda towards these ends.  相似文献   

When reliability and validity were introduced as validation criteria for empirical research in the human sciences, quantitative research methods prevailed, and theory of science relied on neopositivism (Vienna Circle) or postpositivism (scientific realism). Within this worldview, notions of reliability and validity as criteria of scientific goodness were introduced. Reliability and validity were associated with the correspondence theory of truth, which is mostly ill-suited to the needs of qualitative research. For that reason, qualitative research must look for other kinds of validation criteria. The article elaborates the problems arising when the correspondence theory of truth is used as an ultimate criterion in evaluating qualitative research and proposes Heidegger's hermeneutical or alethetical idea of truth as a more suitable approach.  相似文献   

The Course-Faculty Instrument (CFI) demonstrates similar measurement properties with student populations at four diverse institutions. These students agree about the nature and extent to which course and instructor attributes relate to their learning. The results suggest that: (1) a perceived learning criterion may have general relevance to students, and (2) validity extension research is an economically feasible alternative to full-scale instrument development and validation efforts. Since validity extension is practical and facilitates cross-institutional comparisons, it appears to be a more viable strategy for researching and instituting student evaluation systems than is suggested by its current usage.  相似文献   

This study examined the empirical validity of a model of human motivation as it applies to school success and failure in 3 independent samples of 10-to 16-year-old African-American youth. Specifically, we assessed how indicators of context, self , and action relate to measures of risk and resilient outcomes in school in 3 different samples, using 3 different measurement strategies. Correlational and path analyses on the 3 data sets supported the empirical validity of the model. African-American youth's experience of their parents' school involvement predicted a composite of self-system processes, which in turn predicted the subjects' reports of their engagement in school. Engagement then predicted school performance and adjustment. The data supported a reciprocal path from action to context, suggesting that youth who show more disaffected patterns of behavior and emotion in school experience less support from their families than those reporting more engaged patterns of action. Implications for program and policy decisions are discussed.  相似文献   

The task of validating a teacher assessment and improvement system is similar whether the system operates in the United States or in another country. Chile has a national teacher evaluation system (NTES) that is standards based, uses multiple instruments, and is intended to serve both formative and summative purposes. For the past 6 years the authors have performed validation research on NTES using a variety of methods and data sources. This article describes our validation research agenda, the results of major validation studies, and an integration of the existing evidence, and it offers the authors' preliminary judgment about NTES's validity. The article also offers a critical reflection regarding the decisions taken while driving the long and winding validation road, and the lessons we learned during this politically and methodologically complex journey.  相似文献   

Traditionally, measurement specialists have provided testing accommodations for examinees with physical disabilities such as blindness or impaired mobility. Following passage of the Americans with Disabilities Act of 1990, advocates for the disabled have argued that federal law also requires testing accommodations for mental disabilities such as dyslexia and other learning disabilities. Such requested accommodations have included readers, calculators, word processors, and additional time. But these accommodations may affect test validity, requiring measurement specialists to balance the social goal of integrating the disabled against the measurement goal of accurate test score interpretation. Although the courts have provided some guidance regarding testing accommodation requirements for the disabled, they have not yet addressed the issue of where to draw the line on accommodations for mental disabilities. This article explores the measurement problems associated with granting accommodations for mental disabilities, uses existing case law to construct a legal framework for considering such accommodations, arid discusses the advantages and-disadvantages of alternative strategies for handling testing accommodation requests.  相似文献   

The Chinese Early Childhood Environment Rating Scale (trial) (CECERS) is a new instrument for measuring early childhood program quality in the Chinese socio-cultural contexts, based on substantial adaptation from the Early Childhood Environment Rating Scale-Revised Edition (ECERS-R). This paper describes the development and validation process of CECERS. Empirical data were collected from a stratified random sample 178 classrooms, from which a random sample of 1012 children was measured for child development outcomes. Guided by the framework of broad conceptualization of validity and validation as advocated by Messick (1989), evidence in a variety of forms is presented and discussed, including content validity considerations (e.g., measuring socially and culturally relevant domains), measurement reliability considerations (e.g., internal consistency reliability, inter-rater reliability), and measurement validity considerations (concurrent validity, criterion-related validity, internal structure based on exploratory factor analysis). The empirical findings for CECERS compare very favorably with the validation outcomes of ECERS-R. The body of evidence accumulated in the validation process supports the use and interpretation of CECERS scores as quality indicators of early childhood education program in the Chinese social and cultural contexts. Limitations and future directions are also discussed.  相似文献   

Numerous researchers have proposed methods for evaluating the quality of rater‐mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many‐facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On the other hand, popular parametric methods for evaluating rating quality are often based on measurement theories such as invariant measurement. However, these methods are based on assumptions and transformations that may not be appropriate for ordinal ratings. In this study, I show how researchers can use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, to evaluate rating quality within the framework of invariant measurement without the use of potentially inappropriate parametric techniques. I use an illustrative analysis of data from a rater‐mediated writing assessment to demonstrate how one can use numeric and graphical indicators from MSA to gather evidence of validity, reliability, and fairness. The results from the analyses suggest that MSA provides a useful framework within which to evaluate rater‐mediated assessments for evidence of validity, reliability, and fairness that can supplement existing popular methods for evaluating ratings.  相似文献   

除涉及解除婚姻关系的判决外,国外法院的生效判决在我国得到承认和支持的前提是该国与我国有共同参加或缔结的条约,或者两国之间存在互惠关系。然而在境外司法实践中,如美国、德国法院对中国法院生效判决的承认与执行,并不完全以条约或“互惠原则”为前提。因此,我国法院在处理这类问题时应借鉴国外的做法,针对具体情况合理把握。  相似文献   

Academic competence beliefs have been widely studied. However, conceptual and measurement efforts have not yet been directed toward understanding perceived underachievement (feeling that one's accomplishments fall below perceived capability). We conducted two studies in order to develop and examine validity evidence for the Perceived Academic Underachievement Scale (PAUS). Participants were individuals enrolled for credit in at least one post-secondary course. In Study 1, we evaluated content validity and conducted an exploratory factor analysis. In Study 2, we conducted a confirmatory factor analysis and investigated external validity. For both samples, PAUS demonstrated good internal consistency reliability, and items loaded strongly onto a single factor. PAUS was empirically distinct from a range of related constructs. Findings represent preliminary validation evidence.  相似文献   

Recent decisions in the courts about the interpretation of the 1981 Education Act are one of the key factors underlying the government's proposals for amending the law on special educational needs. Jack Rabinowicz, chairman of the Education Law Association and a partner in a firm of London solicitors, summarises the main legal developments relating to the 1981 Act and comments on possible trends in the future.  相似文献   

运用现代教育测量理论,对数学测验进行标准化控制,以提高考试的效度和信度,尽而实现数学教育测量的标准化、科学化.其一般原则也适合于其他学科测验,并给出与考试有关的几个问题.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号