首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Several benefits of using scoring rubrics in performance assessments have been proposed, such as increased consistency of scoring, the possibility to facilitate valid judgment of complex competencies, and promotion of learning. This paper investigates whether evidence for these claims can be found in the research literature. Several databases were searched for empirical research on rubrics, resulting in a total of 75 studies relevant for this review. Conclusions are that: (1) the reliable scoring of performance assessments can be enhanced by the use of rubrics, especially if they are analytic, topic-specific, and complemented with exemplars and/or rater training; (2) rubrics do not facilitate valid judgment of performance assessments per se. However, valid assessment could be facilitated by using a more comprehensive framework of validity when validating the rubric; (3) rubrics seem to have the potential of promoting learning and/or improve instruction. The main reason for this potential lies in the fact that rubrics make expectations and criteria explicit, which also facilitates feedback and self-assessment.  相似文献   

2.
How can performance assessments be used as part of regular instruction? Will this raise student performance on external achievement measures? What aspects of examinee performance improve on the assessment exercises?  相似文献   

3.
Researchers have documented the impact of rater effects, or raters’ tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers’ achievement estimates given their response patterns, has not been investigated. In rater-mediated assessments, person fit reflects the reasonableness of rater judgments of individual test-takers’ achievement over components of the assessment. This study illustrates an approach to visualizing and evaluating person fit in assessments that involve rater judgment using rater-mediated person response functions (rm-PRFs). The rm-PRF approach allows analysts to consider the impact of rater effects on person fit in order to identify individual test-takers for whom the assessment results may not have a straightforward interpretation. A simulation study is used to evaluate the impact of rater effects on person fit. Results indicate that rater effects can compromise the interpretation and use of performance assessment results for individual test-takers. Recommendations are presented that call researchers and practitioners to supplement routine psychometric analyses for performance assessments (e.g., rater reliability checks) with rm-PRFs to identify students whose ratings may have compromised interpretations as a result of rater effects, person misfit, or both.  相似文献   

4.
This article addresses the rhetoric of performance assessment with research on important claims about science performance assessments. We found the following: (a) Concepts and terminology used to refer to performance assessments often were not consistent within and across researchers, educators, and policy-makers. (b) Performance assessments are highly sensitive not only to the tasks and the occasions sampled, but also to the method (e.g., hands-on, computer simulation) used to measure performance. (c) Performance assessments do not necessarily tap higher-order thinking, especially when they are poorly designed. (d) Performance assessments are expensive to develop and use: technology is needed for developing these assessments in an efficient way. (e) Performance assessments do not necessarily have the expected positive impact on teachers' teaching and students' understanding. (f) If teachers are to use performance assessments in their classrooms, they need professional development to help them construct the necessary knowledge and skills. This article attempts to address some of these realities by presenting a conceptual framework that might guide the development and the evaluation of performance assessments, as well as steps that might be taken to create a performance assessment technology and develop teacher inservice programs. © 1996 John Wiley & Sons, Inc.  相似文献   

5.
This article presents a conceptual framework for trust in standardised assessments. Standardised assessments play an important role in many education systems as they inform decisions about students' future schooling career or entry to the labour market. Also, standardised assessments are often used for teacher performance reviews and school accountability, or to monitor learning outcomes on the national level. Various stakeholders rely on the accuracy of assessment outcomes when making decisions about students' competences, or seek to improve the quality of education. Such reliance implies a need for trust in those who design and administer standardised assessments and make decisions on the basis of the outcomes. The framework presented in this article describes the type of relational and macro-level trust that is relevant for three types of assessment systems: national, quasi-market and commercial systems. Throughout the analysis presented, examples are provided to illustrate the ways in which relational and macro-level trust can vary by who is tested and by whom they are assessed; and how trust in evaluations varies by the purpose and consequences of testing, as well as the individual agency of students, their teachers and school leaders.  相似文献   

6.
Performance assessments typically require expert judges to individually rate each performance. This results in a limitation in the use of such assessments because the rating process may be extremely time consuming. This article describes a scoring algorithm that is based on expert judgments but requires the rating of only a sample of performances. A regression-based policy capturing procedure was implemented to model the judgment policies of experts. The data set was a seven-case performance assessment of physician patient management skills. The assessment used a computer-based simulation of the patient care environment. The results showed a substantial improvement in correspondence between scores produced using the algorithm and actual ratings, when compared to raw scores. Scores based on the algorithm were also shown to be superior to raw scores and equal to expert ratings for making pass/fail decisions which agreed with those made by an independent committee of experts  相似文献   

7.
Public speaking and oral assessments are common in higher education, and they can be a major cause of anxiety and stress for students. This study was designed to measure the student experience of public speaking assessment tasks in a mandatory first-year course at a regional Australian university. The research conducted was an instrumental case study, with a student-centred focus. Surveys were designed to elicit student perceptions of their emotions and experience before and after engaging in public speaking skill development exercises and a public assessment task. After undertaking public speaking desensitisation and assessment, students experienced increased satisfaction and decreased fear, indecision and confusion. However, students’ perceptions of their confidence to control nerves, maintain eye contact, use gestures and comfortably speak in front of 25 people reduced – an unexpected outcome of the research. The reasons for this remain unclear, which provides a window for further research. Public speaking assessment tasks should be aligned with learning activities, and opportunities to minimise the impact of barriers to students engaging in the learning activities or tasks should be incorporated into curriculum.  相似文献   

8.
Current educational policies rely on educational assessments. However, the technical aspects of assessments are often unknown to policy makers, which is dangerous because sound assessment policy requires knowledge of the strengths and limitations of educational tests. In this article, we discuss the importance of informing policy makers of important psychometric issues that should be considered whenever tests are proposed for specific purposes. We discuss the types of information that are important to communicate to policy makers, how to best convey this information in a manner in which it can be understood, and how to be seen as a valuable source of information to education policy makers. We end with some specific steps organizations such as NCME can take to inform policy makers and advocate for valid educational assessment policies.  相似文献   

9.
A national survey investigating the use of dynamic assessment and other nontraditional assessment techniques among school psychologists (N = 226) was conducted. Results of the survey indicated that 42% of respondents were at least “somewhat familiar” with dynamic assessment. However, of those familiar with dynamic assessment, only 39% reported using the techniques once a year or more. The most frequently endorsed reasons for not using dynamic assessment (if familiar with it) were lack of knowledge and time restraints. Learning disabled students were the population of students most often evaluated using dynamic assessment and the dynamic assessment was most often used to determine processing strengths and weaknesses. The majority of those familiar with dynamic assessment became so through independent reading. Only 10% reported learning about dynamic assessment through course work. In response to questions regarding assessment techniques most often used with minority students the majority of respondents reported using traditional assessment tools including the WISC‐III (Wechsler Intelligence Scale for Children–Third Edition), BINET IV (Stanford‐Binet Intelligence Scale–Fourth Edition), or KABC (Kaufman Assessment Battery for Children). Overall, the results of the survey suggest that although the population is becoming increasingly more diverse and changes in PL94‐142 (Public Law 94‐142) demand functional assessments, school psychologists continue to rely heavily upon traditional assessment techniques to address referral concerns of all students. This may in large part be due to weaknesses in graduate training programs. © 1999 John Wiley & Sons, Inc.  相似文献   

10.
Rater‐mediated assessments exhibit scoring challenges due to the involvement of human raters. The quality of human ratings largely determines the reliability, validity, and fairness of the assessment process. Our research recommends that the evaluation of ratings should be based on two aspects: a theoretical model of human judgment and an appropriate measurement model for evaluating these judgments. In rater‐mediated assessments, the underlying constructs and response processes may require the use of different rater judgment models and the application of different measurement models. We describe the use of Brunswik's lens model as an organizing theme for conceptualizing human judgments in rater‐mediated assessments. The constructs vary depending on which distal variables are identified in the lens models for the underlying rater‐mediated assessment. For example, one lens model can be developed to emphasize the measurement of student proficiency, while another lens model can stress the evaluation of rater accuracy. Next, we describe two measurement models that reflect different response processes (cumulative and unfolding) from raters: Rasch and hyperbolic cosine models. Future directions for the development and evaluation of rater‐mediated assessments are suggested.  相似文献   

11.
Current engineering courses are not structured to develop real problem-solving skills in their students. They rely on a bottom-up approach to learning, where the first three years is spend mostly on theory, with almost no practice at problem definition. Instead, the students spend most of their time solving carefully designed exercises. Real-world problems are not as neatly packaged as these exercises, and, as a consequence, graduate engineers often lack the problem-definition and problem-recognition skills that are essential if the theory they have learned is to be useful to them. On the contrary, a problem-oriented course requires the students to develop those problem recognition skills. It also is intended to develop student-directed learning, and group and communication skills. A problem-oriented approach was used in 1991 in two second-year courses in civil engineering—surveying and computing. The courses were well received by the students, and the average exam result for surveying showed a noticeable improvement, while the average exam result for computing showed a marginal improvement. (There were, however, other encouraging signs in the computing course.) The author believes that the difference in response between the two subjects is due to the difference between working in groups and working individually, and a course change for the computing subject for 1992 is proposed.  相似文献   

12.
This article calls attention to the overreliance on research about the Performance Assessment for California Teachers (PACT)—often labeled edTPA's predecessor—as justification for the edTPA. The article argues that the distinctions between the assessments are too vast to rely on PACT data to support the edTPA, given the localized nature of PACT and the way in which it is scored.  相似文献   

13.
In order to investigate what issues might be important for experimental training research, a group of experienced remedial teachers was asked to evaluate the potential effectiveness of various spelling exercises. After addressing some general questions about spelling exercises for Dutch poor spellers, they made rankings of several sets of exercises on the basis of the expected effectiveness. The teachers had to give their responses based on their own experiences and with a specific child with poor spelling in mind. The results show that the teachers emphasize the importance of providing rules in spelling exercises, but also agree that poor spellers often have serious difficulties in applying these rules in spelling. Furthermore, the rankings show that exercises with a combination of rule-based strategies and showing the whole orthographic pattern of the word are considered to be most effective. Learning to memorize the word without showing the spelling of the word was considered to be the least effective. Surprisingly, individual characteristics of the children did not seem to have any influence on the ranking of the exercises. It is concluded that exploiting the experience and knowledge of teachers may be good, but is only the first step for further research on the effectiveness of exercises for poor spellers.  相似文献   

14.
Immigrant students, one of the fastest-growing populations in US public schools, have been linguistically and culturally disadvantaged by accountability policies that rely only on standardized tests. Recent changes to these policies allow for the use of performance-based assessment tasks (PBATs) as an assessment indicator to supplement standardized tests. In this article, we explore how 1 highly successful high school that works exclusively with recently arrived immigrant teenagers has incorporated PBATs into its curriculum. We find that school leaders, teachers, and students agree that the use of rigorous performance assessments accomplishes language learning, content mastery, and test preparation simultaneously.  相似文献   

15.
Assessments such as ranking exercises arguably level the playing field for stakeholders. Quality assurance may remain a challenge for consumers, but ‘public’ assessments do provide a nominal element of independence or autonomy. This article outlines, from a German perspective, the way in which research assessments are frequently subject to influence from a variety of sources. It offers some developmental perspectives on assessment as an organic work‐in‐progress for the scientific and research community.  相似文献   

16.
The purpose of the study was to examine the psychometric quality of the National Board for Professional Teaching Standards (NBPTS) Early Childhood/Generalist assessment system. Rating data from the 1997–1998 assessment were analyzed using the FACETS (Linacre, 1998) computer program. The assessment system's 10 exercises work together to define a unidimensional accomplished teaching variable. There is little evidence of multidimensionality in the data, and none of the exercises appear to function in a redundant manner. The exercise set succeeds in defining several statistically distinct levels of accomplished teaching among the candidates. Overall, most candidate rating profiles show consistent performance across the exercises. While assessors differed somewhat in severity, most used the 12-point rating scale consistently. Although the range in assessor variability tends to be moderate, assessor severity does influence the number of candidates that can bank individual exercises. (If a candidate's total scaled score does not meet or exceed the NBPTS performance standard of 275, the candidate can bank scores on exercises in which the candidate received a final score of 2.75 or greater. If the candidate reapplies for certification in the next three years, the candidate does not have to retake banked exercises.)  相似文献   

17.
R. M. Hare has argued for and defended a ‘two-level’, view of moral agency. He argues that moral agents ought to rely on the rules of ‘intuitive moral thinking’ for their ‘everyday’ moral judgments. When these rules conflict or when we do not have a rule at hand, we ought to ascend to the act-utilitarian,‘critical’ level of moral thinking. I argue that since the rules at the intuitive level of moral thinking necessarily conflict much more often than Hare supposes, and since we often do not have ready-made rules for our moral judgments, we must necessarily use critical moral thinking very frequently. However, act-utilitarian judgements at this level will sharply conflict with our strongly held ‘intuitive’ moral convictions. I show that Hare's attempt to balance these two aspects of moral judgment requires us to simultaneously adopt two conflicting sets of moral standards, and thus an attempt to inculcate such standards constitutes a ‘schizophrenic’ moral education. Finally, I briefly outline an alternative conception of moral education, based on Aristotelian phronesis.  相似文献   

18.
Assessing the degree to which interventions are implemented in school settings is critical to making decisions about student outcomes. School psychologists may not be available to regularly conduct observations of intervention implementation, however, their data may be used alongside other methods for multi-informant assessment. Teacher self-report is a commonly used and feasible assessment method. Students have been trained to implement interventions with their peers in instances where traditional adult interventionists were unavailable. This exploratory study investigated the accuracy with which classroom teachers and middle and high school students assessed implementation of the Good Behavior Game and the impact of performance feedback on their accuracy. Results indicated that most students and teachers were able to provide accurate assessments of treatment integrity compared to researcher direct observation; however, some required performance feedback to do so. These findings suggest that multi-informant assessment may be a feasible and accurate way for school psychologists to collect formative treatment-integrity data in the classroom. Limitations and future directions are discussed.  相似文献   

19.
Assessment for learning approaches, such as peer review exercises may improve student performance in summative assessments and increase their satisfaction with assessment practices. We conducted a mixed methods study to evaluate the effectiveness of an oral peer review exercise among post-graduate students. We examined: (1) final assessment grades among students who did and did not take part in the peer review exercise; (2) student perceptions of the impact of the peer review exercise; and (3) student understanding of, and satisfaction with, this new assessment practice. We found that students who took part in the exercise had a significantly higher mean grade in a subsequent summative oral presentation assessment than students who did not take part in the exercise. Students gained a better understanding of assessment and marking criteria and expressed increased confidence and decreased anxiety about completing the subsequent summative assessment. Assessment for learning improves academic attainment and the learning experience in postgraduate students.  相似文献   

20.
Performance assessments have been touted for their multidimensional and ‘realworld’, or authentic, appearance, yet this complexity is at the heart of the most serious problems in the use of performance assessments. If performance assessments are more multidimensional and situational, then perhaps performance assessment scores represent other things besides the construct of interest. This study empirically explored that possibility, namely that scores on a performance assessment reflect a motivational variable (perception of control), in addition to the construct intended, while an objective test does not. Data from high school Spanish students who took an objective test, a performance assessment and a measure of perceptions of control suggest that perceptions of control indeed predict performance assessment scores but not objective test scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号