首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Teacher evaluation systems commonly rely on observation of teaching practice (OTP) by school principals. However, the value of OTP as evidence of teacher effectiveness depends on its psychometric quality. In this study, we address a key aspect of the psychometric quality of principals’ OTP ratings. Specifically, we investigate the degree to which rating scale categories have a consistent interpretation across teaching episodes and practices. Results suggest that the 1,324 principals’ use of the rating scale categories functioned as intended overall. However, we also found that the midpoint category is underutilized and that rating categories do not always reflect similar levels of teaching effectiveness across teaching episodes and practices. When such discrepancies occur, we cannot assume principals’ ratings reflect a consistent level of teacher effectiveness within and across classrooms. This is a critical component of validity evidence that can inform the interpretation of OTP ratings and point to areas for improvement in both the rubrics and in principals’ training for classroom observations.  相似文献   

2.
Range restrictions, or raters’ tendency to limit their ratings to a subset of available rating scale categories, are well documented in large‐scale teacher evaluation systems based on principal observations. When these restrictions occur, the ratings observed during operational teacher evaluations are limited to a subset of the available categories. However, range restrictions are less common within teacher performances that are used to establish links (anchor ratings) in otherwise disconnected assessment systems. As a result, principals’ category use may be different between anchor ratings and operational ratings. The purpose of this study is to explore the consequences of discrepancies in rating scale category use across operational and anchor ratings within the context of teacher evaluation systems based on principal observations. First, we used real data to illustrate the presence of range restriction in operational ratings, and the effect of this restriction on connectivity. Then, we used simulated data to explore these effects using experimental manipulation. Results suggested that discrepancies in range restriction between anchor and operational ratings do not systematically impact the precision of teacher, principal, and teaching practice estimates. We discuss the implications of these results in terms of research and practice for teacher evaluation systems.  相似文献   

3.
Teacher evaluation commonly includes classroom observations conducted by principals. Despite widespread use, little is known about the quality of principal ratings. We investigated 1,324 principals’ rating accuracy of six teaching practices at the conclusion of training within an authentic teacher evaluation system. Data are from a video-based exam of four 10-minute classroom observations. Many-Facet Rasch modeling revealed that (1) overall principals had high accuracy, but individuals varied substantially, and (2) some teaching episodes and practices were easier to rate accurately. For example, promotes critical thinking was rated more accurately than uses formative assessment. Because Many-Facet Rasch modeling estimates individuals’ accuracy patterns across teaching episodes and practices, it is a useful tool for identifying areas that individual principals, or groups, may need additional training (e.g., evaluating formative assessment). Implications for improving training of principals to conduct classroom observations for teacher evaluation are discussed.  相似文献   

4.
We examine the relationships between observational ratings of teacher performance, principals’ evaluations of teachers’ cognitive and non-cognitive skills and test-score based measures of teachers’ productivity. We find that principals can distinguish between high and low performing teachers, but the overall correlation between principal ratings of teachers and teachers’ value-added contribution to student achievement is modest. The variation across metrics occurs in part because they are capturing different traits. While past teacher value-added predicts future value-added, principals’ subjective ratings can provide additional information, particularly when prior value-added measures are based on a single year of teacher performance.  相似文献   

5.
The roles and responsibilities of university supervisors in the practicum have changed during the last ten years. Once, the routine tasks of observation and feedback of student teaching performance were the main responsibilities associated with visits to the school. Lecturers in this role also determined, with the school personnel, the assessment rating of the student teacher. Presently, there is considerable variation in the part played by university supervisors of the practicum. Against this background, this paper examines the actual and ideal characteristics of university supervisors from their own viewpoint and from the perspective of teacher supervisors, principals and student teachers. The reasons for the actual‐ideal discrepancies are discussed, and suggestions are made in relation to addressing the matter.  相似文献   

6.
In most U.S. schools, teachers are evaluated using observation of teaching practice (OTP). This study investigates rater effects on OTP ratings among 421 principals in an authentic teacher evaluation system. Many-facet Rasch analysis (MFR) using a block of shared ratings revealed that principals generally (a) differentiated between more and less effective teachers, (b) rated their teachers with leniency (i.e., overused higher rating categories), and (c) differentiated between teaching practices (e.g., Cognitive Engagement vs. Classroom Management) with minimal halo effect. Individual principals varied significantly in degree of leniency, and approximately 12% of principals exhibited severe rater bias. Implications for use of OTP ratings for evaluating teachers’ effectiveness are discussed. Strengths and limitations of MFR to analyze rater effects in OTP are also discussed.  相似文献   

7.
A multilevel analysis approach was used to analyse students’ evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher’s general teaching effectiveness, one needs to evaluate four randomly chosen course implementations. Two implementations are needed when one course is evaluated, and if one implementation is evaluated, up to 15 feedbacks are needed. The stability of students’ ratings is very high, which reflects students’ stable rating criteria. There is an obvious rating paradox: from the student’s point of view, each rating is very precise, stable and justifiable, but from the teacher’s point of view a single feedback reflects the quality of teaching to just a moderate extent. Cross-hierarchical analysis reveals that there are large discrepancies between the uses of rating scales; some students are systematically more lenient in their rating whereas others are systematically more severe. The study also reveals that some courses are generally rated more favourably and that some courses are more suitable for certain teachers. Managers can thus improve the quality of teaching by finding the most suitable courses for each teacher.  相似文献   

8.
9.
10.
The Vanderbilt Assessment for Leadership in Education (VAL-ED) is a 360-degree learning-centered behaviors principal evaluation tool that includes ratings from the principal, supervisors, and teachers. The current study assesses the test-retest reliability of the VAL-ED for a sample of seven school districts as part of multiple validity and reliability assessments based on various samples of real users of the VAL-ED. We administered the VAL-ED twice and examined the correlations and mean differences between time 1 and time 2. We find that the principal and teacher ratings from time 1 and time 2 have large, positive, and significant correlations. Additionally, for both time points, principals are rated as being at least satisfactorily effective. Principals rate themselves slightly higher at time 2, while teachers rate principals slightly higher at time 1.  相似文献   

11.
Principals’ implementation of new teacher evaluation policies in a suburban and rural southeastern area of the United States was examined over a five-year period. This study reports findings on two of eleven interview questions examining changes in principals’ perceptions over time regarding policy concerns and benefits. Findings indicate while initially overwhelmed, principals eventually managed implementation time challenges and later focused on the benefits of evaluation. Secondly, principals quickly integrated the instructional rubric criteria into classroom observations and professional development work. Third, increasing doubts emerged regarding the inconsistent application of the rubric criteria, the inclusion of student test scores in teacher evaluation, and the calculation of teacher effectiveness ratings. The authors conclude that mandating rigorous evaluation policy will not sufficiently address teacher effectiveness and may complicate principals’ instructional leadership. They assert that policy-makers must consider the long-term effects of implementation before substantial teacher evaluation change results.  相似文献   

12.
13.
This article’s purpose is to highlight the perspectives and actions of urban, public school K-12 principals who are noted for prioritising instructional leadership. Grounded in the conceptual framework of agency, I examined the work experiences of 18 New York City public school principals nominated by supervisors, colleagues, trained educational consultants, parents, and students through a four-phase qualitative study consisting of interviews, time surveys, document review, and observations with participants. In order to uphold instructional leadership, analysis highlighted that participants assumed agency by engendering perspectives and actions that viewed instructional leadership as: grounded in learning, influenced by teachers/staff, requiring time and planning for principals and teachers/staff, and calls for teacher/staff empowerment.  相似文献   

14.
15.
16.
17.
18.
Results from a sample of 1,013 Georgia principals who rated 12,617 teachers are used to compare holistic and analytic principal judgments with indicators of student growth central to the state’s teacher evaluation system. Holistic principal judgments were compared to mean student growth percentiles (MGPs) and analytic judgments from a formal observation protocol. The correlations of a holistic principal rating with teacher MGPs and observation protocol scores were 0.22 and 0.32. Teachers selected as most successful at increasing student achievement had a mean MGP that was a full SD higher than did teachers selected as least successful, and a mean observation protocol score that was 1.35 SDs higher. Holistic principal judgments appear to be much more strongly influenced by observations of teachers’ classroom practices than they were by evidence of growth in student achievement.  相似文献   

19.
This study, using student ratings of lecturers, examines the perceived effect of the lecturer’s ability to communicate effectively. The relationship between the standard question—’The lecturer was able to communicate ideas and information clearly’—and the global rating question—’Overall, the lecturer is an effective teacher’—was investigated in 7072 undergraduate standard teaching surveys from one university, using the lecturer’s language background as a factor. The results show that overall student ratings of English as a second language (ESL) lecturers are, on average, 0.4 points lower on a five‐point scale than student ratings of native English speaking lecturers. There is a strong interaction between this average difference and the lecturer’s faculty, with little difference in arts (humanities and social sciences) through to 0.6 points difference in science. The study also found that, of the four categorical questions used in the university’s standard teaching survey, the ‘communication’ question had the highest correlation with the ‘overall’ question. The correlation (R?=?0.96) suggests that the standard teaching survey is overly influenced by the students’ perception of this one aspect of teaching—reflecting a transmission model. The rating difference between ESL and native English speaking lecturers is briefly explored. In addition, the paper briefly considers the implications of the above findings for teacher development and for student expectations against a background of a growing ESL student population.  相似文献   

20.
Behavior rating scales are indirect measures of emotional and social functioning used for assessment purposes. Rater bias is systematic error that may compromise the validity of behavior rating scale scores. Teacher bias in ratings of behavior has been investigated in multiple studies, but not yet assessed in a research synthesis that focuses on the role of ethnicity and culture. Teacher bias in ratings of student behavior was investigated through a comprehensive literature review that only included studies with a defensible criterion of true behavior against which to compare rating scores. A final total of 13 studies of teacher bias suggested mixed evidence for bias due to student ethnicity and strong evidence of bias due to teacher culture, particularly when positive stereotypes were violated. Limitations and future directions of research are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号