首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A total of 103 academic department heads in four universities rated a set of 15 administrative activities as to their importance. Faculty members in these departments (totalN=1,333) used the same set of activities to rate both the importance they should be given by the department head and the effectiveness with which the head performed each set during the previous 12 months. Tests of reliability revealed that faculty ratings of both importance and performance were made with reasonable internal consistency. Three tests of construct validity showed that each of the three types of ratings were made with at least minimal validity. A principal components analysis of faculty ratings of performance suggested that the department head has three major types of responsibility: personnel management; departmental planning and development; and building the department's reputation.  相似文献   

2.
In this paper, assessments of faculty performance for the determination of salary increases are analyzed to estimate interrater reliability. Using the independent ratings by six elected members of the faculty, correlations between the ratings are calculated and estimates of the reliability of the composite (group) ratings are generated. Average intercorrelations are found to range from 0.603 for teaching, to 0.850 for research. The average intercorrelation for the overall faculty ratings is 0.794. Using these correlations, the reliability of the six-person group (the composite reliability) is estimated to be over 0.900 for each of the three areas and 0.959 for the overall faculty rating. Furthermore, little correlation is found between the ratings of performance levels of individual faculty members in the three areas of research, teaching, and service. The high intercorrelations and, consequently, the high composite reliabilities suggest that a reduction in the number of raters would have relatively small effects on reliability. The findings are discussed in terms of their relationship to issues of validity as well as to other questions of faculty assessment.  相似文献   

3.
In a research project into the effectiveness of mathematics teaching in the first year of secondary education, external observers and students rated teachers' behaviour. The reliability and validity of both methods were established. The results show that teacher behaviour is assessed well when student ratings are aggregated at the classroom level. The quality of aggregated student ratings is as good as the quality of data from external observers. The predictive validity of aggregated student ratings is higher than the predictive validity of external observations when subject motivation is taken as a dependent variable.  相似文献   

4.
The paper provides (1) a teacher-administered rating instrument for inattention without confounding the rating with hyperactivity and conduct disorder, and (2) evidence that the ratings correlate with the scores obtained from cognitive tests of attention. In Study I, the first objective was to investigate the construct validity and the inter-rater reliability of the Attention Checklist (ACL) by factor analysing the teacher ratings of 110 Grade 4 children, obtained by using the ACL. The second objective was to investigate the predictive validity of the ACL by examining the relationship between the scores obtained for the participants from teachers' ratings using the ACL and the scores obtained by participants in the lab-type attention tests. The results of factor analysis showed that a single factor labelled ‘inattention’ underlies the 12 items in the ACL. Examining the differences in performance on attention tests, the ‘low attention’ children as rated by the teachers on the ACL scored lower than the ‘high attention’ children on the objective tests of attention. These findings were replicated in Study II, which was conducted to test further the construct validity and predictive validity of the ACL. This time, only those two tests (Auditory Attention and Visual Attention) that had shown relatively poor discrimination between the high and low attention groups in Study I were, again, administered to another cohort of 97 Grade 4 children, as it was our intention to further challenge the reliability of the ACL. Overall, the results of both studies suggest that comprehensive assessment of attention skills should include both ACL and objective measures of selective attention.  相似文献   

5.
The purpose of this study was to examine the validity and reliability of Curriculum-Based Measures in writing for English learners. Participants were 36 high school English learners with moderate to high levels of English language proficiency. Predictor variables were type of writing prompt (picture, narrative, and expository), time (3, 5, and 7 min), and scoring procedure (words written, words spelled correctly, correct word sequences, correct minus incorrect word sequences). Criterion variables were teacher ratings of writing performance and student performance on the Test of Written Language-III, the writing subtest of the Test of Emerging Academic English, and the Minnesota state writing test. Results supported the validity and reliability of a 5 to 7-min writing sample written in response to a narrative or picture prompt and scored for percent of correct word sequences, correct minus incorrect word sequences, or words written plus correct minus incorrect word sequences.  相似文献   

6.
Peer and self‐ratings have been strongly recommended as the means to adjust individual contributions to group work. To evaluate the quality of student ratings, previous research has primarily explored the validity of these ratings, as indicated by the degree of agreement between student and teacher ratings. This research describes a Generalizability Theory framework to evaluate the reliability of student ratings in terms of the degree of consistency among students themselves, as well as group and rater effects. Ratings from two group projects are analyzed to illustrate how this method can be applied. The reliability of student ratings differs for the two group projects considered in this research. While a strong group effect is present in both projects, the rater effect is different. Implications of this research for classroom assessment practice are discussed.  相似文献   

7.
Surveys of student opinion of tertiary courses often constitute a major source of information for prospective students. Yet the reliability and validity of such surveys have not previously been investigated. In this paper, data from a survey of some 20,000 individual ratings of 224 undergraduate courses were analysed. The purpose was to explore the relationship between students’ ratings and the characteristics of their courses. It was found that these ratings were stable and significantly related to the academic area of the course, the size of class, the percentage of full‐time students, the academic year of the course, and the grades awarded in the course. Implications for the validity of such ratings are then discussed.  相似文献   

8.
Construct validity of peer assessment (PA) is important for PA application, yet difficult to achieve. The present study investigated the impact of an assessment rubric and friendship between the assessor and assessee on construct validity of PA. Two-hundred nine bachelor students participated: half of them assessed a peer's concept map with a rubric whereas the other half did not use a rubric. The results revealed a substantial reliability and construct validity for PA. All students over-score their peers’ performance, but students using a rubric were more valid. Moreover, when using a rubric a high level of friendship between assessor and assessee resulted in more over-scoring. Use of a rubric resulted in higher quality concept maps for peer and expert ratings.  相似文献   

9.
Various approaches of assessing instructional quality have emerged in educational research. In this article, we present two studies that apply the thin slices procedure, investigating the reliability and validity of the ratings of three dimensions of instructional quality based solely on the first impressions of untrained social observers. Thirty undergraduate students rated 30-s clips from English lessons (Study 1) and Math lessons (Study 2) regarding three quality dimensions. The findings suggest high reliability in these ratings. Multilevel confirmatory analyses suggested construct validity in terms of differentiation between the three dimensions of instructional quality. Finally, we found some overlap between the thin slices ratings of classroom management and constructive support with ratings of trained raters based on observations of full lessons, as well as students’ ratings of these dimensions. We discuss these results with respect to the potential of first impressions of untrained observers to measure instructional quality.  相似文献   

10.
A syllabus analysis instrument was developed to assist program evaluators, administrators and faculty in the identification of skills that students use as they complete their college coursework. While this instrument can be tailored for use with a variety of learning domains, we used it to assess students' use of and exposure to computer technology skills. The reliability and validity of the instrument was examined through an analysis of 88 syllabi from courses within the teacher education program and the core curriculum at a private Midwest US university. Results indicate that the instrument has good inter‐rater reliability and ratings by and interviews with faculty and students provide evidence of construct validity. The use and limitations of the instrument in educational program evaluation are discussed.  相似文献   

11.
Faculty Perspectives on Course and Teacher Evaluations   总被引:2,自引:0,他引:2  
Student ratings of instruction have been the subject of numerous studies with much of the research focusing on the validity and reliability of the ratings themselves. Comparatively little empirical investigation has been devoted to the perceptions of the individuals who are the subjects of the ratings, that is, the faculty. The current study explored faculty perspectives on the usefulness of student ratings for formative and summative purposes, and the actual use of student ratings for summative purposes. Contrary to what might have been deduced from the anecdotal literature, the results of this study do not portray a great deal of resistance to student ratings in general or to their use for formative and summative evaluation. It was also found that student ratings are actually being used for the latter purpose. The usefulness of the student feedback was viewed differentially by the faculty, with feedback on their interaction with students seen as most useful, followed by feedback on their grading practices, global ratings of instructor and course, and finally structural issues of the course.  相似文献   

12.
Many personnel committees at colleges and universities in the USA use student evaluation of faculty instruction to make decisions regarding tenure, promotion, merit pay or faculty professional development. This study examines the construct validity and internal consistency reliability of the student evaluation of instruction (SEI) used at a large mid‐western university in the USA for both administrative and instructional purposes. The sample consisted of 73,500 completed SEIs for undergraduate students who self‐reported as freshman, sophomore, junior or senior. Confirmatory factor analysis via structural equation modelling was used to explore the construct validity of the SEI instrument. The internal consistency of students' ratings was reported to provide reliability evidence. The results of this study showed that the model fits the data for the sample. The significance of this study as well as areas for further research are discussed.  相似文献   

13.
In globalization, global competence (GC) is a crucial competence for graduate students to possess; thus, graduate education should prepare students with GC to compete globally. However, no instrument has been designed to measure graduate students' GC, and the theoretical structure of GC has not been empirically examined. To fill these gaps, first, we developed the Global Competence Scale for graduate students (GCSG) based on a three-dimensional theoretical framework (knowledge, skills, and attitudes). Second, we administered the GCSG to Chinese graduate students sampled from five universities in Beijing. Third, we examined the theoretical framework, and examined the reliability and validity of the scale. Finally, we described the Chinese graduate student sample’s GC by using the instrument. The results supported the theoretical model and provided evidence for the reliability and validity of the instrument. We also found that the sample showed higher ratings in knowledge and attitudes but lower ratings in communication skills.  相似文献   

14.
In an essay rating study multiple ratings may be obtained by having different raters judge essays or by having the same rater(s) repeat the judging of essays. An important question in the analysis of essay ratings is whether multiple ratings, however obtained, may be assumed to represent the same true scores. When different raters judge the same essays only once, it is impossible to answer this question. In this study 16 raters judged 105 essays on two occasions; hence, it was possible to test assumptions about true scores within the framework of linear structural equation models. It emerged that the ratings of a given rater on the two occasions represented the same true scores. However, the ratings of different raters did not represent the same true scores. The estimated intercorrelations of the true scores of different raters ranged from .415 to .910. Parameters of the best fitting model were used to compute coefficients of reliability, validity, and invalidity. The implications of these coefficients are discussed.  相似文献   

15.
ABSTRACT

A Bayesian IRT-model approach was used to investigate the validity and reliability of student perceptions of teaching quality. Furthermore, the student perceptions were compared with ratings of teaching quality by external observers. Grade 4 students (n = 675) filled out a questionnaire that was used to measure their opinions about the lessons of their teachers. Three lessons of 39 teachers were recorded and rated by 4 raters. The analyses showed that student perception and lesson observation scales fit best in an 11-dimensional model, which was an indication of construct validity and discriminant validity. Student perception scales were reliable, although not all items contributed to the scales to the same extent. Student ratings and lesson observations scores generally correlated moderately (ranging from r = .18 to r = .50). Higher correlations were found for scales with a similar content; however, no clear pattern was apparent. Suggestions for future research are presented.  相似文献   

16.
Student assessment of teaching in higher education   总被引:1,自引:0,他引:1  
Plans to introduce campus-wide assessments of college or university teaching which are largely dependent on student ratings are seen as a threat to academic freedom in those institutions with little or no experience of this form of evaluation. While regular student evaluations of teaching are very common in North America, their introduction is only now being considered in colleges and universities in a number of other countries. Research on the reliability and validity of student ratings indicate that they are capable of providing valuable information about the quality of teaching. Depending on the survey used, this type of evaluation may be used to provide evidence of teaching ability to staffing committees or to suggest ways of improving teaching. The paper concludes with a set of recommendations for higher education institutions which are considering the regular assessment of all teachers by their students.  相似文献   

17.
Student ratings, a critical component in policy efforts to assess and improve teaching, are often collected using questionnaires, and inferences about teachers are then based on aggregated student survey responses. While considerable attention has been paid to the reliability and validity of these aggregates, much less attention has been paid to within-classroom consensus, and what that consensus can reveal about classrooms. This study used data from the Measures of Effective Teaching Project to investigate how the consensus among student ratings in a classroom can enhance our understanding of the learning environment, and potentially could be used to understand features of instructional practice. The results suggest that consensus is related to teacher effectiveness, the questioning strategies used by teachers, and the demographic heterogeneity of students. The possibility of instructional subclimates and the implications for the use of overall averages in teacher appraisal are discussed together with directions for future research.  相似文献   

18.
19.
20.
The peer rating system used here advances the quantitative literacy goals outlined in the social sciences. We instituted a mid-semester intervention to teach rating skills and used an index to track longitudinal changes of skill mastery over the course of the semester. Seventy-four students in five advanced research classes followed the procedure of the existing peer rating system by completing reading assignments, writing reflections online, engaging in class discussions, rating their peers’ reflections and receiving feedback on their group effort. Peer ratings were then compared with each other and also with the instructor ratings to derive individualised indices of reliability and validity. These technical indicators enabled two rounds of assessment before and after a class-wide intervention. An omnibus test across the five classes showed a significant improvement in rating quality due to the intervention. Our courses not only met a quantitative learning outcome but also promised vocational competence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号