首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Score reports have one or more intended audiences: the people who use the reports to make decisions about test takers, including teachers, administrators, parents and test takers. Attention to audience when designing a score report supports assessment validity by increasing the likelihood that score users will interpret and use assessment results appropriately. Although most design guidelines focus on making score reports understandable to people who are not testing professionals, audiences should be defined by more than just their lack of statistical knowledge. This paper introduces an approach to identifying important audience characteristics for designing computer-based, interactive score reports. Through three examples, we demonstrate how an audience analysis suggests a design pattern, which guides the overall design of a report, as well as design details, such as data representations and scaffolding. We conclude with a research agenda for furthering the use of audience analysis in the design of interactive score reports.  相似文献   

The conventional focus of validity in educational measurement has been on intended interpretations and uses of test scores. Empirical studies of test use by teachers, administrators and policy-makers show that actual interpretations and uses of test scores in context are invariably shaped by local users’ questions, which frequently require attention to multiple sources of evidence about students’ learning and the factors that shape it, and depend on local capacity to use such information well. This requires a more complex theory of validity that can shift focus as needed from the intended interpretations and uses of test scores that guide test developers to local capacity to support the actual interpretations, decisions and actions that routinely serve local users’ purposes. I draw on the growing empirical literature on data use to illustrate the need for an expanded theory of validity, point to theoretical resources that might guide such an expansion, and suggest a research agenda towards these ends.  相似文献   

Advances in validity theory and alacrity in validation practice have suffered because the term validity has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test uses). This article provides a brief summary of current validity theory, explication of a critical flaw in the current conceptualisation of validity, and a framework that both accommodates and differentiates validation of test score inferences and justification of test use.  相似文献   

A misconception exists that validity may refer only to the interpretation of test scores and not to the uses of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test scores remains essential today. However, test scores are not interpreted and then ignored; rather, their interpretations lead to actions. Thus, a modern definition of validity needs to describe the validation of test score interpretations as a necessary, but insufficient, step en route to validating the uses of test scores for their intended purposes. To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests. The current definition of validity stipulated in the 2014 version of the Standards for Educational and Psychological Testing properly describes validity in terms of both interpretations and uses, and provides a sufficient starting point for validation.  相似文献   

Evaluating the multiple characteristics of alignment has taken a prominent role in educational assessment and accountability systems given its attention in the No Child Left Behind legislation (NCLB). Leading to this rise in popularity, alignment methodologies that examined relationships among curriculum, academic content standards, instruction, and assessments were proposed as strategies to evaluate evidence of the intended uses and interpretations of test scores. In this article, we propose a framework for evaluating alignment studies based on similar concepts that have been recommended for standard setting (Kane). This framework provides guidance to practitioners about how to identify sources of validity evidence for an alignment study and make judgments about the strength of the evidence that may impact the interpretation of the results.  相似文献   

This article reviews the intended uses of these college‐ and career‐readiness assessments with the goal of articulating an appropriate validity argument to support such uses. These assessments differ fundamentally from today's state assessments employed for state accountability. Current assessments are used to determine if students have mastered the knowledge and skills articulated in state standards; content standards, performance levels, and student impact often differ across states. College‐ and career‐readiness assessments will be used to determine if students are prepared to succeed in postsecondary education. Do students have a high probability of academic success in college or career‐training programs? As with admissions, placement, and selection tests, the primary interpretations that will be made from test scores concern future performance. Statistical evidence between test scores and performance in postsecondary education will become an important form of evidence. A validation argument should first define the construct (college and career readiness) and then define appropriate criterion measures. This article reviews alternative definitions and measures of college and career readiness and contrasts traditional standard‐setting methods with empirically based approaches to support a validation argument.  相似文献   

How we choose to use a term depends on what we want to do with it. If validity is to be used to support a score interpretation, validation would require an analysis of the plausibility of that interpretation. If validity is to be used to support score uses, validation would require an analysis of the appropriateness of the proposed uses, and therefore, would require an analysis of the consequences of the uses. In each case, the evidence need for validation would depend on the specific claims being made.  相似文献   

Validity is a central principle of assessment relating to the appropriateness of the uses and interpretations of test results. Usually, one of the inferences that we wish to make is that the score reflects the extent of a student’s learning in a given domain. Thus, it is important to establish that the assessment tasks elicit performances that reflect the intended constructs. This research explored the use of three methods for evaluating whether there are threats to validity in relation to the constructs elicited in international A level geography examinations: (a) Rasch analysis; (b) analysis of processes expected and apparent when students answer questions; and (c) qualitative analysis of responses to items identified as potentially problematic. The results provided strong evidence to support validity with regard to the elicitation of constructs although one question part was identified as a threat to validity. Strengths and weaknesses of the methods can be identified.  相似文献   

Assessment Validation in the Context of High-Stakes Assessment   总被引:1,自引:0,他引:1  
Including the perspectives of stakeholder groups (e.g., teachers, parents) can improve the validity of high-stakes assessment interpretations and uses. How stakeholder groups view high-stakes assessments and their uses may differ significantly from state-level policy officials. The views of these stakeholders can contribute to identifying the strengths and weaknesses of the intended assessment interpretations and uses. This article proposes a process approach to validity that addresses assessment validation in the context of high-stakes assessment. The process approach includes a test evaluator or validator who considers the perspectives of five stakeholder groups at four different stages of assessment maturity in relationship to six aspects of construct validity. The tasks of the test evaluator and how stakeholders' views might be incorporated are illustrated at each stage of assessment maturity. How the test evaluator might make judgments about the merit of high-stakes assessment interpretations and uses is discussed.  相似文献   

This article has three goals. The first goal is to clarify the role that the consequences of test score use play in validity judgments by reviewing the role that modern writers on validity have ascribed for consequences in supporting validity judgments. The second goal is to summarize current views on who is responsible for collecting evidence of test score use consequences by attempting to separate the responsibilities of the test developer and the test user. The last goal is to offer a framework that attempts to prescribe the conditions under which the responsibility for collecting evidence of consequences falls to the test developer or to the test user.  相似文献   

The rise of computer‐based testing has brought with it the capability to measure more aspects of a test event than simply the answers selected or constructed by the test taker. One behavior that has drawn much research interest is the time test takers spend responding to individual multiple‐choice items. In particular, very short response time—termed rapid guessing—has been shown to indicate disengaged test taking, regardless whether it occurs in high‐stakes or low‐stakes testing contexts. This article examines rapid‐guessing behavior—its theoretical conceptualization and underlying assumptions, methods for identifying it, misconceptions regarding its dynamics, and the contextual requirements for its proper interpretation. It is argued that because it does not reflect what a test taker knows and can do, a rapid guess to an item represents a choice by the test taker to momentarily opt out of being measured. As a result, rapid guessing tends to negatively distort scores and thereby diminish validity. Therefore, because rapid guesses do not contribute to measurement, it makes little sense to include them in scoring.  相似文献   

Despite the ease of accessing a wide range of measures, little attention is given to validity arguments when considering whether to use the measure for a new purpose or in a different context. Making a validity argument has historically focused on the intended interpretation and use. There has been a press to consider both the intended and actual interpretations and how users make sense of the data when constructing validity arguments, but the practice is not widespread. This paper contributes to existing research on validity by highlighting the value of attending to the actual interpretation and use of a measure aimed at supporting instructional improvement in mathematics. We describe the use of the same measure across two contexts to highlight the importance of attending to characteristics of both users and the contexts in which the measures are used when assessing the validity of inferences for the purpose of instructional improvement efforts.  相似文献   

Because school learning entails not just accretion of knowledge but the structuring and restructuring of knowledge and cognitive skills, the conception and construction of educational achievement measures must be cast in developmental terms. And because student characteristics as well as social and educational experiences influence current performance, the interpretation and implications of educational achievement measures must be relative to intrapersonal and situational contexts. These points imply a strategy of comprehensive assessment in context that focuses on the processes and structures involved in subject-matter competence as moderated in performance by personal and environmental influences. This article addresses in detail both the nature of developing competence and its measurement in terms of context-dependent task performance. Construct-irrelevant task difficulty that might jeopardize the meaning of test scores as well as construct-irrelevant influences that might jeopardize implications for action are taken into account via the comprehensive measurement of relevant contextual factors. Comprehensive assessment in context thus facilitates valid interpretations of the meaning and implications of ability and achievement scores in particular instances, thereby lightening the interpretive and ethical burdens on test users and enhancing the validity of test use.  相似文献   

The interpretability of score comparisons depends on the design and execution of a sound data collection plan and the establishment of linkings between these scores. When comparisons are made between scores from two or more assessments that are built to different specifications and are administered to different populations under different conditions, the validity of the comparisons hinges on untestable assumptions. For example, tests administered across different disability groups or tests administered to different language groups produce scores for which implicit linkings are presumed to hold. Presumed linking makes use of extreme assumptions to produce links between scores on tests in the absence of common test material or equivalent groups of test takers. These presumed linkings lead to dubious interpretations. This article suggests an approach that indirectly assesses the validity of these presumed linkings among scores on assessments that contain neither equivalent groups nor common anchor material.  相似文献   

Student examinees are key stakeholders in large-scale, high-stakes, public examination systems. How they perceive the purpose, comprehend the technical characteristics of testing and how they interpret scores influence their response to the system demands and their preparation for the examinations; this information relates to intended and unintended consequences of testing and is a component of an expanded notion of test validity. The research reported in this paper investigates examinees’ perceptions about the secondary school graduation and university-entrance national exams in Cyprus. Interviews with recent examinees reveal the versatility and complexity of their perceptions about the fairness and appropriateness of the system, which are influenced by design features of the exams and by the local context. There are important, mostly unintended, consequences on their in- and out-of-school experience, on school curricula and on instructional practices. Empirical evidence about consequential aspects of examinations contributes to the validity argument needed to support such programmes.  相似文献   

Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe test speededness, its potential threats to validity, and traditional and modern methods that can be used to assess the presence of speededness. We argue that more attention must be paid to this issue and that more research must be done to set appropriate time limits on power tests so that speed of responding does not interfere with the construct measured.  相似文献   

The Moral Competence Test (MCT) was designed over 30 years ago to provide a resource for educators interested in conducting cross-cultural studies of moral development and education. Since its origin, it has been translated into at least 30 languages and used in hundreds of studies. However, few studies provide evidence to support the use of the test in the US. The test’s designer identified three criteria for evaluating the construct validity of the test and its primary scores: do correlations of stage scores reflect a simplex structure, do ratings follow the theoretical order of stages, does the test differentiate preferences and structures of reasoning. We use these criteria and evidence of criterion and content validity to assess the validity of the MCT. We present results from two US samples (n = 772). Results analyzing the test author’s criteria support the semantic validity of the test, however, evidence of criterion validity raise questions about the C-score as a measure of moral competence. After controlling for stage preferences, the C-score was negatively related to democratic attitudes and positively related to dogmatism.  相似文献   

Scaling is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of information into a score scale, and introduces vertical scaling and its related designs and methodologies as a special type of scaling. After completion of this module, the reader should be able to understand the relationship between various types of raw scores, understand the relationship between raw scores and scale scores, construct a scale with desired properties, evaluate an existing score scale, understand how content and standards information are built into a scale, and understand how vertical scales are developed and used in practice.  相似文献   

In 2018, 26 states administered a college admissions test to all public school juniors. Nearly half of those states proposed to use those scores as their academic achievement indicators for federal accountability under the Every Student Succeeds Act (ESSA); many others are planning to use those scores for other accountability purposes. Accountability encompasses a number of different uses and subsumes a variety of claims. For states proposing to use summative tests for accountability, a validity argument needs to be developed, which entails delineating each specific use of test scores associated with accountability, identifying appropriate evidence, and offering a rebuttal to counterclaims. The aim of this article is to support states in developing a validity argument for use of college admission test scores for accountability by identifying claims that are applicable across states, along with summarizing existing evidence as it relates to each of these claims. As outlined by The Standards for Educational and Psychological Testing, multiple sources of evidence are used to address each claim. A series of threats to the validity argument, including weaker alignment with content standards and potential influences in narrowing teaching, are reviewed. Finally, the article contrasts validity evidence, primarily from research on the ACT, with regulatory requirements from ESSA. The Standards and guidance addressing the use of a “nationally recognized high school academic assessment” (Elementary and Secondary Education Act (ESEA), Negotiated Rulemaking Committee; Department of Education) are the primary sources for the organization of validity evidence.  相似文献   

The AERA, APA, NCME Standards define validity as ‘the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests’. A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it is one idea, not a sequence of steps. Just as test design is framed by a particular context of use, so too must validation research focus on the adequacy of tests for specific purposes. The consensus definition also carries forward major reforms in validity theory begun in the 1970s that rejected separate types of validity evidence for different types of tests, e.g. content validity for achievement tests and predictive correlations for employment tests. When the current definition refers to both ‘evidence and theory’ the Standards are requiring not just that a test be well designed based on theory but that evidence be collected to verify that the test device is working as intended. Having taught policy-makers, citizens, and the courts to use the word validity, especially in high-stakes applications, we cannot after the fact substitute a more limited, technical definition of validity. An official definition provides clarity even for those who disagree, because it serves as a touchstone and obliges them to acknowledge when they are departing from it.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号