首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Teacher evaluation systems commonly rely on observation of teaching practice (OTP) by school principals. However, the value of OTP as evidence of teacher effectiveness depends on its psychometric quality. In this study, we address a key aspect of the psychometric quality of principals’ OTP ratings. Specifically, we investigate the degree to which rating scale categories have a consistent interpretation across teaching episodes and practices. Results suggest that the 1,324 principals’ use of the rating scale categories functioned as intended overall. However, we also found that the midpoint category is underutilized and that rating categories do not always reflect similar levels of teaching effectiveness across teaching episodes and practices. When such discrepancies occur, we cannot assume principals’ ratings reflect a consistent level of teacher effectiveness within and across classrooms. This is a critical component of validity evidence that can inform the interpretation of OTP ratings and point to areas for improvement in both the rubrics and in principals’ training for classroom observations.  相似文献   

2.
This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used  相似文献   

3.
Teacher evaluation commonly includes classroom observations conducted by principals. Despite widespread use, little is known about the quality of principal ratings. We investigated 1,324 principals’ rating accuracy of six teaching practices at the conclusion of training within an authentic teacher evaluation system. Data are from a video-based exam of four 10-minute classroom observations. Many-Facet Rasch modeling revealed that (1) overall principals had high accuracy, but individuals varied substantially, and (2) some teaching episodes and practices were easier to rate accurately. For example, promotes critical thinking was rated more accurately than uses formative assessment. Because Many-Facet Rasch modeling estimates individuals’ accuracy patterns across teaching episodes and practices, it is a useful tool for identifying areas that individual principals, or groups, may need additional training (e.g., evaluating formative assessment). Implications for improving training of principals to conduct classroom observations for teacher evaluation are discussed.  相似文献   

4.
Werdelin, I. (1969). Relationship between Teacher Ratings, Peer Ratings, and Self Ratings of Behavior in School. Scand. J. Educ. Res., 13, 147‐169. A rating scale of student behavior was given to teachers and a self‐rating scale to the students themselves, who also judged the behavior of their classmates. The teacher rating scale was factor analyzed, separately for each sex, and so was the self‐rating scale. The teacher rating scale, the self‐rating scale, and the peer rating scale were also factor analyzed jointly. Relations with other behavior variables and intellectual variables were found. Close connection appeared between teacher ratings (including school marks) and peer ratings, while self‐ratings differed considerably from these.

This is a report from the School Mathematics Project at the School of Education, Malmö, Sweden. The data treated were collected during the author's stay in the United States, and he wants to thank Dr. Max Beberman for kind support.  相似文献   

5.
Two hundred and thirty-six teachers were independently rated by their principals and supervisors on twenty-three scales of teacher competence. Each teacher received forty-six ratings (23 from a principal and 23 from a supervisor). The rating scales were intercorrelated and the resulting matrix factor analyzed. Two correlated factors emerged, one corresponding to principals’ ratings and the other to supervisors’ ratings. The results were interpreted to mean that the rating scales generated data that were more a reflection of the rater’s point of view than of a teacher’s actual classroom behavior.  相似文献   

6.
The Department of Education is moving to change accountability for teacher preparation institutions to include surveys of the graduates and their supervising principal following paid employment. This study describes one of a number of quantitative studies that examine the validity and usefulness of such follow-up surveys. Using multiple years of data, the authors examined the effect of teacher socioeconomic status and ethnicity on principals' evaluation of the teachers' preparation. The results indicated that there was no difference in ratings based on graduates' parent education, family income, or ethnicity. Post hoc evaluation showed that Latino teachers were rated better prepared to work with diversity in the classroom and to teach English learners. Bias does not appear to be part of principal evaluation. However, because principals are prone to rating teachers on a binary, satisfactory/unsatisfactory basis, follow-up surveys may not be the most useful tool for assessing some nuances of teacher preparation.  相似文献   

7.
This paper studies differences between girls ‘ and boys ‘ perceptions of mathematical and scientific higher-order thinking, ways of identifying when higher-order thinking occurs, and methods of mathematical and scientific inquiry that assist in developing higher-level thinking in both young students and pre-service teachers. Participants included 17 pre-service teacher candidates (16 female, 1 male) enrolled in an integrated elementary mathematics and science methods course, and 102 elementary students from large, metropolitan schools (52 females and 50 males from lower-middle- and high-middle-class homes). A 15-item Likert-style rating scale instrument was used. Qualitative measures including observations, interviews and reflections were completed in conjunction with the more quantitative rating scale measure to triangulate the design. Pre-service teacher candidates rated the significance of childrens’ responses and reflected on findings. Results revealed similar ratings between genders and significance on items relating to perceptions of what science and mathematics are, whether girls should be scientists, and objects/manipulatives versus paper/pencil tasks in mathematics.  相似文献   

8.
The major purpose of this study was to assess the presence of evaluation system components that assist principals in responding to incompetent teachers. According to Virginia principals, 5 per cent of the teachers in their schools were incompetent; however, only 2.65 per cent were documented formally as being incompetent. The typical principal with a staff of 100 teachers identifies 1.53 incompetent tenured teachers per year and remediates 0.68 teacher, encourages 0.37 teacher to resign or retire, reassigns 0.29 teacher, and recommends dismissal for 0.10 teacher. The four evaluation system components of remedial procedures, evaluation criteria, evaluator training, and organizational commitment were found to predict 69 per cent of the variance in the principals' effectiveness rating of their evaluation systems, but none of the evaluation system components were found to predict administrative responsiveness to incompetence. Such findings suggest that principals and school systems are avoiding a serious problem that undermines the education of millions of children, staff morale, and the publics perception of education.  相似文献   

9.
Whether screening tests or teacher ratings best predict children at risk for reading failure continues to be an area of disagreement in the early identification literature. Our early studies confirmed low positive identification rates (30%) when kindergarten teachers were asked to predict future reading achievement using a traditional rating scale, while a project-developed, theory-based screening battery correctly identified 81% of poor readers. Construction of a teacher rating scale of current skill levels on research-validated precursors to reading improved prediction in the current study, although results were still inferior to the screening test (64% and 80% valid positives, respectively). Combining test results and teacher ratings resulted in 88% identification of those who failed in first, second, or third grade, suggesting that both teacher ratings and screening tests should be used to identify the largest number of those who will later fail in reading. © 1998 John Wiley & Sons, Inc.  相似文献   

10.
In most U.S. schools, teachers are evaluated using observation of teaching practice (OTP). This study investigates rater effects on OTP ratings among 421 principals in an authentic teacher evaluation system. Many-facet Rasch analysis (MFR) using a block of shared ratings revealed that principals generally (a) differentiated between more and less effective teachers, (b) rated their teachers with leniency (i.e., overused higher rating categories), and (c) differentiated between teaching practices (e.g., Cognitive Engagement vs. Classroom Management) with minimal halo effect. Individual principals varied significantly in degree of leniency, and approximately 12% of principals exhibited severe rater bias. Implications for use of OTP ratings for evaluating teachers’ effectiveness are discussed. Strengths and limitations of MFR to analyze rater effects in OTP are also discussed.  相似文献   

11.
Results from a sample of 1,013 Georgia principals who rated 12,617 teachers are used to compare holistic and analytic principal judgments with indicators of student growth central to the state’s teacher evaluation system. Holistic principal judgments were compared to mean student growth percentiles (MGPs) and analytic judgments from a formal observation protocol. The correlations of a holistic principal rating with teacher MGPs and observation protocol scores were 0.22 and 0.32. Teachers selected as most successful at increasing student achievement had a mean MGP that was a full SD higher than did teachers selected as least successful, and a mean observation protocol score that was 1.35 SDs higher. Holistic principal judgments appear to be much more strongly influenced by observations of teachers’ classroom practices than they were by evidence of growth in student achievement.  相似文献   

12.
Evaluation of teaching effectiveness by different sources is a well established practice. It is generally carried out in the form of student evaluation using rating scales. This article describes one such system designed for the College of Engineering at King Abdul Aziz University, Jeddah, Saudi Arabia. It describes the operation of the system in detail, including a rating scale questionnaire about both teacher and course. The author proposes a method of analysing the responses so as to make them effective for feedback. In order to demonstrate the efficiency of this proposed method of analysis, he presents a sample analysis of a two-semester period involving eight teachers, six courses (multi-section), and 16 responses. The results show that:
1 The average response for the same teacher from different courses was fairly consistent.
2 A norm can be developed to compare the average responses for different teachers.
3 There was a good correlation between the average rating of a teacher and the percentage of students wanting to take another course with him.
4 The students' responses about the content of a multi-section course were fairly consistent from different sections of that course.
5 The overall ratings of different courses were compared with each other, and showed good consistency with the nature of the courses.
Since the results were based on a two-semester period, a semester-to-semester comparison of either the teacher ratings or course ratings could not be made, but as more data is gathered this may become possible in future.  相似文献   

13.
Behavior rating scales are indirect measures of emotional and social functioning used for assessment purposes. Rater bias is systematic error that may compromise the validity of behavior rating scale scores. Teacher bias in ratings of behavior has been investigated in multiple studies, but not yet assessed in a research synthesis that focuses on the role of ethnicity and culture. Teacher bias in ratings of student behavior was investigated through a comprehensive literature review that only included studies with a defensible criterion of true behavior against which to compare rating scores. A final total of 13 studies of teacher bias suggested mixed evidence for bias due to student ethnicity and strong evidence of bias due to teacher culture, particularly when positive stereotypes were violated. Limitations and future directions of research are discussed.  相似文献   

14.
15.
As managers of teachers, principals must cope with a variety of educational mistakes in their schools. These cumulative mistakes lead to a state of borderline competency in the teacher. From interviews with 30 elementary school principals in a large urban school system, interviews and observations of 19 borderline competent teachers, and review of principals' teacher evaluation records, five states of coping with teachers' educational mistakes emerged: (1) deployment-enlisting the teachers' colleagues to watch over him or her and report back to the principal on the teacher's behavior; (2) detente-bringing the troubled teacher within the society of peers and rallying forces to help solve his or her problems; (3) determination-deciding that the range of the teacher's deviations exceeds the boundaries of normative behavior and there is cause for dismissal; (4) evaluation-assigning an unsatisfactory efficiency rating with extensive documentation and recordkeeping for teachers who have been identified as borderline competent; and (5) formal dismissal-taking action to remove a teacher from the school.  相似文献   

16.
The effects of rating scale format (behaviorally anchored vs. Likert) and rater training on leniency and halo in student ratings of instruction were investigated. The subjects (N=269) were students enrolled in required courses at a graduate theological seminary in the Southwest United States. A repeated measures design controlling for teacher and course was used. Findings indicated: (a) training was effective in reducing leniency and halo in ratings from both instruments; (b) trained raters exhibited less leniency on two rating dimensions when using behaviorally anchored rating scales (BARS's) than when using the Likert scale; and (c) trained raters exhibited less halo when using the Likert than when using the BARS. The findings demonstrate the importance of focusing efforts to improve quality of ratings on the students rather than on the format of the instrument.Presented at the Twenty-Eighth Annual Forum of the Association for Institutional Research, Phoenix, Ariz., May 1988.  相似文献   

17.
Abstract

Parents and teachers of twenty-one well-adjusted and twenty-one poorly-adjusted kindergarten children rated them on a semantic differential scale consisting of fifty bipolar adjectives. The hypothesis of a greater discrepancy in ratings between father and mother of poorly-adjusted children as compared with well-adjusted children was supported at the .05 level. Further, greater discrepancies were found between mother and teacher and between father and teacher of the former group as compared with the latter. The items showing significantly greater discrepancies for the poorly-adjusted children as compared with those rated as well-adjusted were identified  相似文献   

18.
Students in part‐time courses were interviewed about their perceptions of good teaching and tutoring. The perceptions differed markedly between those with reproductive conceptions of learning and students holding self‐determining ones. The former preferred didactic teaching but disliked interaction, whereas the latter had almost diametrically opposite perspectives by finding student‐centred approaches consistent with their conceptions of learning. The findings have implications for the evaluation of teaching, as ratings are likely to be influenced by the predominant conceptions of learning of a class. It is common for individual instructors to be regularly evaluated by teacher evaluation questionnaires, which often have a teacher‐centred bias, and for the ratings to be used for appraisal. It is argued that this leads to conservatism as teachers fear that students with reproductive conceptions of learning will reduce their ratings if they innovate in their teaching. As the degree of bias from this ratings‐lowering phenomenon may be quite large, the findings are a caution against the common practice of using absolute rating values from both teacher evaluation questionnaires and programme‐level evaluation by instruments such as the Course Experience Questionnaire. Results need to be interpreted together with other evidence and take into account contextual factors including students' conceptions of learning.  相似文献   

19.
文章通过对ADDIE模型和中小学教师培训项目设计研究实践框架的分析,探讨并构建出了适合教师培训的流程模型,并对此模型的内涵进行了详细解析,最后通过实践验证了该模型的应用效果,其最大差异是将ADDIE模型中的评估进行细化,将其分成论证评价、形成性评价和总评三类,并重新研究了其各阶段间的关系。  相似文献   

20.
The article examines the tensions one superintendent in the USA experienced as he evaluated principals in a high-stakes environment that had undergone numerous transformations at the central office. Using qualitative methods, primarily, shadowing techniques, observations and debriefing, the following tensions emerged and were examined in light of the work of the superintendent evaluating principal performance: (1) discrepancies between principal performance when compared to performance data, (2) length of time in the principalship compared to results, (3) finding the right balance between student achievement data and other indicators of principal performance, (4) what types of achievement data are important and when these data are made available, (5) credence paid to complaints about structural changes implemented by the principal, (6) balancing the principal self-evaluation rating scores with the final evaluator scores and (7) accounting for personal factors such as relationship to principals and knowledge about principal capabilities. Each of these tensions contributes to the difficulty a superintendent may feel when conducting the principal evaluation process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号