首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Is construct validity relevant to performance assessment? Can these assessments allow meaningful comparisons? How can we minimize validity-reducing errors?  相似文献   

2.
Should the Standards reflect the perspective that construct validity is central to all validation efforts? Is the construct-/content-/criterion-related categorization of validity evidence now obsolete? Should the definition of validity include consideration of the consequences of test use?  相似文献   

3.
How will the expansion of the concept of construct validity affect validation practice in employment testing? How does the need for consequential validity differ in educational and employment testing? How do the research bases differ for performance assessment in these settings? Are there parallel trends in policies for test use in education and industry?  相似文献   

4.
Current Concerns in Validity Theory   总被引:3,自引:0,他引:3  
We are at the end of the first century of work on models of educational and psychological measurement and into a new millennium. This certainly seems like an appropriate time for looking backward and looking forward in assessment. Furthermore, a new edition of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) has been published, and the previous editions of the Standards have served as benchmarks in the development of measurement theory.
This backward glance will be just that, a glance. After a brief historical review focusing mainly on construct validity, the current state of validity theory will be summarized, with an emphasis on the role of arguments in validation. Then how an argument-based approach might be applied will be examined in regards to two issues in validity theory: the distinction between performance-based and theory-based interpretations, and the role of consequences in validation.  相似文献   

5.
The Standards for Educational and Psychological Testing identify several strands of validity evidence that may be needed as support for particular interpretations and uses of assessments. Yet assessment validation often does not seem guided by these Standards, with validations lacking a particular strand even when it appears relevant to an assessment. Consequently, the degree to which validity evidence supports the proposed interpretation and use of the assessment may be compromised. Guided by the Standards, this article presents an independent validation of OECD's PISA assessment of mathematical self-efficacy (MSE) as an instructive example of this issue. OECD identifies MSE as one of a number of “factors” explaining student performance in mathematics, thereby serving the “policy orientation” of PISA. However, this independent validation identifies significant shortcomings in the strands of validity evidence available to support this interpretation and use of the assessment. The article therefore demonstrates how the Standards can guide the planning of a validation to ensure it generates the validity evidence relevant to an interpretive argument, particularly for an international large-scale assessment such as PISA. The implication is that assessment validation could yet benefit from the Standards as what Zumbo calls “a global force for testing”.  相似文献   

6.
The Centrality of Test Use and Consequences for Test Validity   总被引:3,自引:0,他引:3  
What are the origins of consequential validity? What is the role of intended test use in validation? Is the study of unintended effects part of validation? What practical problems does this pose?  相似文献   

7.
Can measurement specialists’current ideas about content validation be implemented with licensure examinations? Does pressure of litigation facilitate or inhibit conducting validity studies?  相似文献   

8.
Conclusion Validity theory, together with currently available and emerging standards for performance assessments, provides guidance for the developers of high-stakes performance assessments. It is imperative, however, that important aspects of validity and standards for quality and fairness of performance assessments be built into such assessments from their very inception. Specifying the target performance in terms legitimate to all of the assessment participants and creating an explicit methodology for integrating diverse points of view provide the foundation for defensible assessments. It is only through painstaking analyses and field work, however, that many validity-related aspects of the assessments can be satisfactorily resolved. Perhaps, with the passage of time, a cycle can be established in which these experiences from the field can inform further development of standards of performance assessment, which can then be used to raise the standard of assessment development practice. Only then can the full promise of modern validity theory be fulfilled.  相似文献   

9.
How can performance assessments be used as part of regular instruction? Will this raise student performance on external achievement measures? What aspects of examinee performance improve on the assessment exercises?  相似文献   

10.
Research Findings: Data that serve to establish the convergent and discriminant construct validity of a new behavior rating scale for use with the early childhood preschool population-the Preschool and Kindergarten Behavior Scales (PKBS)-are presented. The results of four different studies are presented wherein PKBS ratings of preschool or kindergarten age children were correlated with established comparison measures: the Social Skills Rating System, Mattson Evaluation of Social Skills with Youngsters, Conners Teacher Rating Scale, and School Social Behavior Scales. Correlations were in the desired directed for demonstrating convergent and discriminant construct validity of the PKBS. Practice Implications: The PKBS appears to adequately measure the constructs of social skills and both internalizing and externalizing problem behavior in early childhood. Although additional validation research for this instrument is needed, the PKBS appears to show promise as a research tool, screening device, and assessment instrument for assessing social-emotional behavior of children ages 3–6. Given the increasing importance of early detection of social-emotional problems as part of a comprehensive system of prevention and early intervention, future efforts at linking assessment tools to specific and effective intervention techniques appear to be a much needed and significant endeavor.  相似文献   

11.
Research Findings: Data that serve to establish the convergent and discriminant construct validity of a new behavior rating scale for use with the early childhood preschool population-the Preschool and Kindergarten Behavior Scales (PKBS)-are presented. The results of four different studies are presented wherein PKBS ratings of preschool or kindergarten age children were correlated with established comparison measures: the Social Skills Rating System, Mattson Evaluation of Social Skills with Youngsters, Conners Teacher Rating Scale, and School Social Behavior Scales. Correlations were in the desired directed for demonstrating convergent and discriminant construct validity of the PKBS. Practice Implications: The PKBS appears to adequately measure the constructs of social skills and both internalizing and externalizing problem behavior in early childhood. Although additional validation research for this instrument is needed, the PKBS appears to show promise as a research tool, screening device, and assessment instrument for assessing social-emotional behavior of children ages 3-6. Given the increasing importance of early detection of social-emotional problems as part of a comprehensive system of prevention and early intervention, future efforts at linking assessment tools to specific and effective intervention techniques appear to be a much needed and significant endeavor.  相似文献   

12.
Developing a Strong Program of Construct Validation: A Test Anxiety Example   总被引:2,自引:0,他引:2  
What would a strong program of construct validation look like for the concept of test anxiety? What are the components of strong validation programs? In particular, how does structural equation modeling fit into such a program?  相似文献   

13.
Assessment Validation in the Context of High-Stakes Assessment   总被引:1,自引:0,他引:1  
Including the perspectives of stakeholder groups (e.g., teachers, parents) can improve the validity of high-stakes assessment interpretations and uses. How stakeholder groups view high-stakes assessments and their uses may differ significantly from state-level policy officials. The views of these stakeholders can contribute to identifying the strengths and weaknesses of the intended assessment interpretations and uses. This article proposes a process approach to validity that addresses assessment validation in the context of high-stakes assessment. The process approach includes a test evaluator or validator who considers the perspectives of five stakeholder groups at four different stages of assessment maturity in relationship to six aspects of construct validity. The tasks of the test evaluator and how stakeholders' views might be incorporated are illustrated at each stage of assessment maturity. How the test evaluator might make judgments about the merit of high-stakes assessment interpretations and uses is discussed.  相似文献   

14.
What criteria should be applied to the evaluation of performance measures? How consistent are the results from performance measures and norm-referenced achievement tests? How can we ensure fairness and credibility in performance measurement?  相似文献   

15.
通过定性和定量研究,分析传统完形填空中的定距删词和C—试题在中国英语考试中的效度。对比研究发现,在试题的编写、评分和信度方面,C—试题优于传统完形填空,它简单、经济、客观、信度高。传统完形填空则在效度方面高于C—试题,建议将其作为一种替换题型用于综合英语能力考试中。C—试题可作为一种词汇练习形式用于课堂练习或词汇测试中。  相似文献   

16.
Teaching for the Test: Validity, Fairness, and Moral Action   总被引:1,自引:0,他引:1  
In response to heightened levels of assessment activity at the K-12 level to meet requirements of the No Child Left Behind Act of 2001, measurement professionals are called to focus greater attention on four fundamental areas of measurement research and practice: (a) improving the research infrastructure for validation methods involving judgments of test content; (b) expanding the psychometric definition of fairness in achievement testing; (c) developing guidelines for validation studies of test use consequences; and (d) preparing teachers for new roles in instruction and assessment practice. Illustrative strategies for accomplishing these goals are outlined.  相似文献   

17.
Abstract

The authors present the development and validation of the Standards Performance Continuum (SPC), a measure for assessing teacher performance of the Standards for Effective Pedagogy. The authors created the SPC to serve as an instrument for research on the relationship between teachers' use of these standards and instructional effectiveness, as a guide for teacher professional development, and as a tool for education reform. The goal of this research was to construct a quantitative instrument that is practical, easily scored, and readily interpretable. The standards are discussed, and 3 studies provide evidence of interrater reliability, concurrent validity, and criterion-related validity supporting the validity of interpretations of data gathered with the SPC.  相似文献   

18.
This paper addresses the construct as well as the criterion validity of the Differential Aptitude Test (DAT) for the assessment of secondary school minority group students ( N = 111) as compared to majority group students ( N = 318) in The Netherlands. Comparison of the test dimensions with the structural equation modelling program EQS showed that construct validity was good for both groups. With one exception, the subtests of the DAT measured the cognitive abilities of minority and majority group students equally well. The estimate of g as computed with the DAT showed strong predictive validity with little bias for various school subjects and achievement tests for mathematics and Dutch. Although some criteria revealed prediction bias to the disadvantage of the minority group, these differences concerned very small changes in R 2 . Conversely, the predictive value decreased substantially when an estimate of g was used excluding subtests that measure aspects of crystallised intelligence. Spearman's hypothesis tested with DAT subtest scores and criterion scores showed that g explained most of the group differences. Professional test users can safely draw conclusions from the DAT regardless of the students' ethnicity.  相似文献   

19.
综合运用德尔菲法、文献调研法和层次分析法构建由管理绩效等2个一级指标,项目立项等7个二级指标以及立项依据等20个主要观测点组成的环保专项资金绩效评价指标体系,确定指标权重,在此基础上构建基于模糊综合评判法的综合评价体系,并对某市2009年环保专项资金绩效评价进行了实证研究,研究结论验证了构建的综合评价体系是有效的,且具有可操作性。  相似文献   

20.
The scoring process is critical in the validation of tests that rely on constructed responses. Documenting that readers carry out the scoring in ways consistent with the construct and measurement goals is an important aspect of score validity. In this article, rater cognition is approached as a source of support for a validity argument for scores based on constructed responses, whether such scores are to be used on their own or as the basis for other scoring processes, for example, automated scoring.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号