首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 62 毫秒
A misconception exists that validity may refer only to the interpretation of test scores and not to the uses of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test scores remains essential today. However, test scores are not interpreted and then ignored; rather, their interpretations lead to actions. Thus, a modern definition of validity needs to describe the validation of test score interpretations as a necessary, but insufficient, step en route to validating the uses of test scores for their intended purposes. To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests. The current definition of validity stipulated in the 2014 version of the Standards for Educational and Psychological Testing properly describes validity in terms of both interpretations and uses, and provides a sufficient starting point for validation.  相似文献   

The conventional focus of validity in educational measurement has been on intended interpretations and uses of test scores. Empirical studies of test use by teachers, administrators and policy-makers show that actual interpretations and uses of test scores in context are invariably shaped by local users’ questions, which frequently require attention to multiple sources of evidence about students’ learning and the factors that shape it, and depend on local capacity to use such information well. This requires a more complex theory of validity that can shift focus as needed from the intended interpretations and uses of test scores that guide test developers to local capacity to support the actual interpretations, decisions and actions that routinely serve local users’ purposes. I draw on the growing empirical literature on data use to illustrate the need for an expanded theory of validity, point to theoretical resources that might guide such an expansion, and suggest a research agenda towards these ends.  相似文献   

The Standards for Educational and Psychological Testing identify several strands of validity evidence that may be needed as support for particular interpretations and uses of assessments. Yet assessment validation often does not seem guided by these Standards, with validations lacking a particular strand even when it appears relevant to an assessment. Consequently, the degree to which validity evidence supports the proposed interpretation and use of the assessment may be compromised. Guided by the Standards, this article presents an independent validation of OECD's PISA assessment of mathematical self-efficacy (MSE) as an instructive example of this issue. OECD identifies MSE as one of a number of “factors” explaining student performance in mathematics, thereby serving the “policy orientation” of PISA. However, this independent validation identifies significant shortcomings in the strands of validity evidence available to support this interpretation and use of the assessment. The article therefore demonstrates how the Standards can guide the planning of a validation to ensure it generates the validity evidence relevant to an interpretive argument, particularly for an international large-scale assessment such as PISA. The implication is that assessment validation could yet benefit from the Standards as what Zumbo calls “a global force for testing”.  相似文献   

The ability to convey shared meaning with minimal ambiguity is highly desirable for technical terms within disciplines and professions. Unfortunately, there is no widespread professional consensus over the meaning of the word ‘validity’ as it pertains to educational and psychological testing. After illustrating the nature and extent of disagreement, we consider three options for reaching consensus: to eliminate its ambiguity by agreeing a precise technical definition; to embrace its ambiguity by agreeing a catchall lay usage; and to retire ‘validity’ from the testing lexicon.  相似文献   

Should the Standards reflect the perspective that construct validity is central to all validation efforts? Is the construct-/content-/criterion-related categorization of validity evidence now obsolete? Should the definition of validity include consideration of the consequences of test use?  相似文献   

In 2018, 26 states administered a college admissions test to all public school juniors. Nearly half of those states proposed to use those scores as their academic achievement indicators for federal accountability under the Every Student Succeeds Act (ESSA); many others are planning to use those scores for other accountability purposes. Accountability encompasses a number of different uses and subsumes a variety of claims. For states proposing to use summative tests for accountability, a validity argument needs to be developed, which entails delineating each specific use of test scores associated with accountability, identifying appropriate evidence, and offering a rebuttal to counterclaims. The aim of this article is to support states in developing a validity argument for use of college admission test scores for accountability by identifying claims that are applicable across states, along with summarizing existing evidence as it relates to each of these claims. As outlined by The Standards for Educational and Psychological Testing, multiple sources of evidence are used to address each claim. A series of threats to the validity argument, including weaker alignment with content standards and potential influences in narrowing teaching, are reviewed. Finally, the article contrasts validity evidence, primarily from research on the ACT, with regulatory requirements from ESSA. The Standards and guidance addressing the use of a “nationally recognized high school academic assessment” (Elementary and Secondary Education Act (ESEA), Negotiated Rulemaking Committee; Department of Education) are the primary sources for the organization of validity evidence.  相似文献   

In setting the cut-scores on National Curriculum tests it is important to maintain standards. In the process of test development, both within and across years, changes are made to the style of the questions in order to increase their ‘accessibility’. This raises the question of whether a more accessible test should have higher cut-scores. Purely statistical definitions of equating are blind to differences between ‘accessibility’ and ‘easiness’ and cut-scores derived from statistical equating methods will be higher for a more accessible test. Arguments about the increased validity of the more accessible test are sometimes used to justify not raising the cut-scores as much as would be indicated by statistical methods. These arguments are shown to be equivalent to postulating that changing the accessibility is changing the construct measured by the test. Using a statistical measurement model can provide a rational basis for understanding accessibility and identifying types of question where accessibility issues are causing a measurement problem.  相似文献   

The psychometric literature is replete with comprehensive discussions of test validity, test validation, and the characteristics of quality assessment programs. The most authoritative source for guidance regarding sound test development and evaluation practices is the Standards for Educational and Psychological Testing. However, the Standards are not legally binding. In this article, we review the way in which validity is conceptualized in the Standards and compare this conceptualization with validity evidence presented in specific court cases involving legal challenges to tests. Our review indicates that, in general, there is strong congruence between the Standards and how validity is viewed in the courts, and that testing agencies that conform to these guidelines are likely to withstand legal scrutiny. However, the courts have taken a more practical, less theoretical view on validity and tend to emphasize evidence based on test content and testing consequences.  相似文献   


The authors present the development and validation of the Standards Performance Continuum (SPC), a measure for assessing teacher performance of the Standards for Effective Pedagogy. The authors created the SPC to serve as an instrument for research on the relationship between teachers' use of these standards and instructional effectiveness, as a guide for teacher professional development, and as a tool for education reform. The goal of this research was to construct a quantitative instrument that is practical, easily scored, and readily interpretable. The standards are discussed, and 3 studies provide evidence of interrater reliability, concurrent validity, and criterion-related validity supporting the validity of interpretations of data gathered with the SPC.  相似文献   

The paper provides (1) a teacher-administered rating instrument for inattention without confounding the rating with hyperactivity and conduct disorder, and (2) evidence that the ratings correlate with the scores obtained from cognitive tests of attention. In Study I, the first objective was to investigate the construct validity and the inter-rater reliability of the Attention Checklist (ACL) by factor analysing the teacher ratings of 110 Grade 4 children, obtained by using the ACL. The second objective was to investigate the predictive validity of the ACL by examining the relationship between the scores obtained for the participants from teachers' ratings using the ACL and the scores obtained by participants in the lab-type attention tests. The results of factor analysis showed that a single factor labelled ‘inattention’ underlies the 12 items in the ACL. Examining the differences in performance on attention tests, the ‘low attention’ children as rated by the teachers on the ACL scored lower than the ‘high attention’ children on the objective tests of attention. These findings were replicated in Study II, which was conducted to test further the construct validity and predictive validity of the ACL. This time, only those two tests (Auditory Attention and Visual Attention) that had shown relatively poor discrimination between the high and low attention groups in Study I were, again, administered to another cohort of 97 Grade 4 children, as it was our intention to further challenge the reliability of the ACL. Overall, the results of both studies suggest that comprehensive assessment of attention skills should include both ACL and objective measures of selective attention.  相似文献   

Several countries have been developing teaching standards for the purpose of providing recognition and more attractive career pathways to teachers who attain these standards. These initiatives aim to lift the status of teaching as a profession and to provide stronger incentives for professional learning. This article describes the work of a project at the Australian Council for Educational Research, the ACER Portfolio Project, designed to develop methods whereby teachers can demonstrate how their practice meets the Australian Professional Standards for Teachers ( www.aitsl.edu.au/teach/standards ) at the ‘Highly Accomplished’ level and test them in schools for their validity and feasibility. The article describes how the Project developed an assessment framework that provided a representative sample of evidence about a teacher's practice covering the Standards, trialed portfolio tasks in schools with volunteer teachers and tested whether it was possible to train other teachers to assess their portfolio entries reliably and set standards for highly‐accomplished teaching.  相似文献   

With the increasing use of automated scoring systems in high-stakes testing, it has become essential that test developers assess the validity of the inferences based on scores produced by these systems. In this article, we attempt to place the issues associated with computer-automated scoring within the context of current validity theory. Although it is assumed that the criteria appropriate for evaluating the validity of score interpretations are the same for tests using automated scoring procedures as for other assessments, different aspects of the validity argument may require emphasis as a function of the scoring procedure. We begin the article with a taxonomy of automated scoring procedures. The presentation of this taxonomy provides a framework for discussing threats to validity that may take on increased importance for specific approaches to automated scoring. We then present a general discussion of the process by which test-based inferences are validated, followed by a discussion of the special issues that must be considered when scoring is done by computer.  相似文献   

The concept of validity in theory and practice   总被引:1,自引:1,他引:0  
The concept of validity, as described in the literature, has changed over time to become a broad and rather complex issue. The purpose of this paper is to investigate if practice has followed theory, or if there is a gap between validity in theory and validity in practice. It compares the theoretical development of the concept of validity with the methodology adopted in validity studies over time. Important phases in the history of validity, and also common arguments for and against traditional and modern validity perspectives, are presented and discussed. Thereafter, three Swedish research projects aiming to validate instruments used for selection to higher education are described. The idea is to use these projects as examples of contemporary practice, and to compare their designs, research questions and outcomes with how validity was theoretically described during their specific period of time. The conclusions from these comparisons are that practices seem to have followed theory when it comes to how the validity research programmes have been designed, but not when it comes to how they then were carried out in practice. This gap between theory and practice seems to have increased with the introduction of broader and more modern validity perspectives. The scope of the research is more extensive but results are fragmented and there is no evidence of a ‘unified’ validity argument, which has been one of the central aspects in modern validity theory. This supports the arguments that validity theory is difficult to put into practice and that there is a need for guidance on how to prioritise validity questions and interpret validity evidence.  相似文献   

Test-takers' interpretations of validity as related to test constructs and test use have been widely debated in large-scale language assessment. This study contributes further evidence to this debate by examining 59 test-takers' perspectives in writing large-scale English language tests. Participants wrote about their test-taking experiences in 300 to 500 words, focusing on their perceptions of test validity and test use. A standard thematic coding process and logical cross-analysis were used to analyze test-takers' experiences. Codes were deductively generated and related to both experiential (i.e., testing conditions and consequences) and psychometric (i.e., test construction, format, and administration) aspects of testing. These findings offer test-takers' voices on fundamental aspects of language assessment, which bear implications for test developers, test administrators, and test users. The study also demonstrated the need for obtaining additional evidence from test-takers for validating large-scale language tests.  相似文献   

The National Assessment Program – Literacy and Numeracy (NAPLAN) in Australia is a series of literacy and numeracy tests that are used for purposes of school comparison. This paper argues that a key question for this use lies in whether or not this is a reasonable, or valid, use of the test data. Using Kane’s argumentative approach to validity, this paper argues that the comparisons of the quality of student achievement made available on the My School Website have low validity due to the lack of regard to rates of participation in schools. In bringing together the literature that addresses the ‘new governance’ of education through testing and an approach to validity that addresses the technical aspects of test score interpretation, with the ethics of how test scores are used and applied, this study identifies validity as an important consideration in comparative analyses of student achievement data. The identification of the need to consider participation in such comparisons through the application of the argumentative approach to validity highlights the contribution of this article not only to the testing field but also to critical policy literature.  相似文献   

Validity is the most fundamental consideration in test development. Understandably, much time, effort, and money is spent in its pursuit. Central to the modern conception of validity are the interpretations made, and uses planned, on the basis of test scores. There is, unfortunately, however, evidence that test users have difficulty understanding scores as intended. That is, although the proposed interpretations and use of test scores might be theoretically valid they might never come to be because the meaning of the message is lost in translation. This necessitates pause. It is almost absurd to think that the intended interpretations and uses of test scores might fail because there is a lack of alignment with the actual interpretations made and uses enacted by the audience. Despite this, there has only recently been contributions to the literature regarding the interpretability of score reports, the mechanisms by which scores are communicated to their audience, and their relevance to validity. These contributions have focused upon linking, through evidence, the intended interpretation and use with the actual interpretations being made and actions being planned by score users. This article reviews the current conception of validity, validation, and validity evidence with the goal of positioning the emerging notion of validity of usage within the current paradigm.  相似文献   


Jonathan Long suggests that current interpretations of values education are dominated by a number of ideas that appear to make attempts to achieve clarity and consensus extremely difficult. He argues that ‘essential to the successful development and promotion of values in a secular state educational system is a shifting of emphasis which enables us to see the context as an opportunity rather than a problem’. He goes on to suggest that schools should focus on what he describes as the ‘roots’ of values. In his view, these roots involve the questions ‘What is it to be human?’ and ‘What counts as human flourishing?’. Thus, schools must engage with issues of identity and direction.  相似文献   

Current Concerns in Validity Theory   总被引:3,自引:0,他引:3  
We are at the end of the first century of work on models of educational and psychological measurement and into a new millennium. This certainly seems like an appropriate time for looking backward and looking forward in assessment. Furthermore, a new edition of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) has been published, and the previous editions of the Standards have served as benchmarks in the development of measurement theory.
This backward glance will be just that, a glance. After a brief historical review focusing mainly on construct validity, the current state of validity theory will be summarized, with an emphasis on the role of arguments in validation. Then how an argument-based approach might be applied will be examined in regards to two issues in validity theory: the distinction between performance-based and theory-based interpretations, and the role of consequences in validation.  相似文献   

The paper examines the impact of the plethora of innovations associated with the 1988 Education Reform Act (ERA) on the management of whole-school change in the primary school. Based upon qualitative data from a national sample of 50 schools in England and Wales, it documents the growing tensions between collegial and top-down managerial approaches. These are evidenced in the changing nature of working collaboratively in primary schools, in the creation of new management structures, in the me of school development plans and in the growth of quality assurance mechanisms, particularly in relation to preparation for Office for Standards in Education (OFSTED) inspections. There are conflicting interpretations of terms such as ‘collegiality’, ‘collaboration’, ‘teamwork’ and ‘whole-school approaches’ and there have been subtle shifts in their meaning and in their realisation in practice in the pre-and post-ERA context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号