首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 156 毫秒
目前,相当多的院校开展了学生评价教师教学质量(以下简称学生评教)这项工作。有学者指出,这种被寄予厚望的方法,其实际操作评价结果的信度和效度并不理想,甚至有偏执和异化的趋向。 一、非常态学生评教引发的变异及影响 所谓学生评教,顾名思义就是指由学生对教师的教学效果进行评价,是指在被评教师的任课班级中采用无记名调查方式,由参与调查评价的学生根据评价量表和自身意见,对被评教师进行评定的一种教学评价方式。  相似文献   

传统的旅游英语口语课堂教学测试模式存在着测试信度与效度问题,以及学生学习积极性问题,因此教师要在课堂环节设计、课堂教学测试方式与评分标准上有针对性地进行改革,以提高课堂教学效果和口语测试的信度和效度。提出了"节节测试"的口语课堂教学测试模式,这种测试模式是形成性评估在教学实践中的具体应用,能很好地解决测试信度与效度,以及学生学习积极性问题。  相似文献   

论以学生为主体的高等学校教学质量评价   总被引:1,自引:0,他引:1  
高等学校实施的以学生为主体的教学质量评价指学生按照规定的项目和程序给任课教师打分或分等.这种评价方式基于教育学和心理学的合理假设,具有相当信度,但也存在一些有待解决的问题,如评价指标和项目的设计、评价集的选择、参评学生样本、对教师的教学发展潜力考察以及某些社会风气对学生价值观的影响等等,应采取相应措施加以解决.  相似文献   

同伴评价作为新的评价形式,不仅符合核心素养测评对评价工具的要求,更具备学习功能.其发展经历了由疏离教育领域、聚焦学习结果的评价工具、聚焦学习过程的评价工具到作为学生参与学习的工具等变化.同伴评价的实施需要操作模型支持,最新的同伴评价操作模型包括12个操作步骤,具有形成性评价与学习工具的功能,是教师完整实施、实现同伴评价功能的重要保证.未来的同伴评价研究将聚焦核心素养培育,进一步强调同伴评价的学习功能;优化操作模型,增加同伴评价的使用情境;增加实证研究,深化同伴评价的信度和效度.  相似文献   

伙伴评估方式在中国大学英语口语课堂教学中具有一定的有效性。我们通过调查中国大学生对于用伙伴评估方式评价同伴英语口语水平,比较了英语口语能力评估中教师评估和伙伴评估的相同相异点,结果表明大多数学生对这种评价方式持肯定支持态度,并且伙伴评估和老师评估的分数十分接近。这一评估方式能够激起学生的学习兴趣、学习动机以及责任感,建立一种良性的友好活跃的学习气氛,同时使得教师能够有更多的精力投入到课堂教学以及相关的更多科研工作中,是一种积极有效的方式。  相似文献   

能力本位评价若干问题研究   总被引:4,自引:0,他引:4  
文章主要对能力本位评价的实质、实施程序、信度和效度、优点和缺点进行了研究。相对纸笔评价而言,能力本位评价虽然有许多优点,但它自身也存在许多问题,有些问题是能力本位评价理念本身所固有的,这说明这种评价模式也具有局限性。  相似文献   

英语口语测试是对学生的英语综合能力进行检测的一种最直接的方式。同时也是所有语言能力测试中最难进行的。口语测试的信度与效度决定了测试是否公正合理地反映了考生的真实口语水平。因此,对高考英语口语测试的信度和效度进行分析,保证试题设计与测试过程的科学性,判分的可行性与可靠性,直接决定了高考口语测试举办的成功与否,以及对高中口语教学所产生的正向反拨作用。  相似文献   

高职会计专业现行的学生成绩评价方式--纸笔测验评价方式产生的评价结果与学生所具有的会计实践操作能力相关性非常低,缺乏相应的信度和效度.因此,在高职学校会计专业学生学业成绩评价中,应当改革传统纸笔测验评价方式,代之以工作样本评价方式,使之能够客观、真实地反映出学生的会计实践操作能力和素养.  相似文献   

运用<大学教师教学效果评价问卷>(学生用)进行调查分析,探讨师生性别因素及认知方式对评定教师教学效果的影响.结果表明:理科大学生和文科教师的性别因素对评价教学效果的影响很大,男生的打分高于女生,男教师在总分以及学习价值感等四个因子上的得分高于女教师.理科大学生与文科教师的不同认知方式对教学评价的主效应显著:场依存型教师得分最低,场独立型教师得分最高;场中间型学生给老师打分最高,场独立型学生打分最低.  相似文献   

本研究以适应性课堂教学评价为基础,以教师为本,由教师自己设计课堂教学评价标准与指标。通过两次公开课共收集94份教师评价数据,以检验评价量表的信度和效度。结果表明,该量表具有较高的内部一致性信度与结构效度。  相似文献   

Motivating students to perform well on assessment tests is difficult when students know the results have no academic consequence. The present study evaluates the influence of assessment context (graded vs. non-graded) on the reliability of an assessment measure. Results indicate the graded condition produces higher reliability (r= .71) than the non-graded condition (r = .29), which leads to unacceptably low reliability. Moreover, the graded condition produces significantly higher scores (M = 64%), than the non-graded condition (M = 43%). Only students in the graded condition (41%) obtained passing scores of 70% or above.  相似文献   

Peer assessment has been studied in various situations and actively pursued as a means by which students are given more control over their learning and assessment achievement. This study investigated the reliability of staff and student assessments in two oral presentations with limited feedback for a school-based thesis course in engineering where the different disciplines ran their own assessment. Staff scores were generally found to be more reliable than student scores, but it was not consistent. The engineering disciplines displayed widely varying differences between the staff and student scores. A large variation in reliability was found for the different disciplines making it difficult to reconcile to a standard school-based overall grading scheme for the course when the reliability of marking was low. The results also showed that the average scores for oral presentations are generally much higher than for written examinations, based on the grade point average. Future research will need to investigate how best to develop a consistent assessment framework for oral presentations and moderation of scores for consistency, particularly for large classes where assessment are done by different groups of staff and students and collated into a single set of marks for the class.  相似文献   

评价偏见涉及考试的公平性,是由经济、教育、技术等各方面因素综合作用的结果。评价偏见的类型包括平均分差异、题目功能差异、错误解释成绩、性别或种族内容、内容和经历不同、选拔决策采用的统计模型、错误的标准测量工具等造成的评价偏见。评价偏见不仅影响考生个人发展机会,而且危害社会公平。鉴别评价偏见可用审判的方法和实证的方法。对美国评价偏见的研究有助于完善我国考试,促进考试公平。  相似文献   

The consensual assessment technique (CAT) is a measurement tool for creativity research in which appropriate experts evaluate creative products [Amabile, T. M. (1996). Creativity in context: Update to the social psychology of creativity. Boulder, CO: Westview]. However, the CAT is hampered by the time-consuming nature of the products (asking participants to write stories or draw pictures) and the ratings (getting appropriate experts). This study examined the reliability of ratings of sentence captions. Specifically, four raters evaluated 12 captions written by 81 undergraduates. The purpose of the study was to see whether the CAT could provide reliable ratings of captions across raters and across multiple captions and, if so, how many such captions would be required to generate reliable scores, and how many judges would be needed? Using generalizability theory, we found that captions appear to be a useful way of measuring creativity with a reasonable level of reliability in the frame of CAT.  相似文献   

The purpose of this study was to compare the effects of two peer assessment methods on university students' academic writing performance and their satisfaction with peer assessment. This study also examined the validity and reliability of student generated assessment scores. Two hundred and thirty-two predominantly undergraduate students were selected by convenience sampling during the fall semester of 2007. The results indicate that students in the experimental group demonstrated greater improvement in their writing than those in the comparison group, and the findings reveal that students in the experimental group exhibited higher levels of satisfaction with the peer assessment method both in peer assessment structure and peer feedback than those in the comparison group. Additionally, the findings indicate that the validity and reliability of student generated rating scores were extremely high. Using Wiki interactive software and providing an online collaborative learning environment to facilitate peer assessment added value to peer assessment.  相似文献   

课程考核作为教学的最后环节,对于整体教学质量提高具有重要的引导和决定作用。目前高校很多课程都以大作业方式进行考核,考核的结果主要以教师的主观评价作为得分依据,客观性有所欠缺。提出一种以教师为主导,学生参与互评的多元主体评分方法,将多元主体评分可靠性计入评分权重,降低了传统考核方式主要依靠教师评价的单一性和主观性,提高了课程学习结果评价的客观性。  相似文献   


The authors address the reliability of scores obtained on the summative performance assessments during the pilot year of our research. Contrary to classical test theory, we discussed the advantages of using generalizability theory for estimating reliability of scores for summative performance assessments. Generalizability theory was used as the framework because of the flexibility this approach provides for examining sources of inconsistency within a complex assessment. Two major sources of inconsistency on scores considered in this study were raters and agencies (teachers' rating vs. researchers' rating). Overall, results showed that the inconsistency in scores attributable to raters and agencies was relatively small. Suggestions regarding improvement of consistency in the subsequent years of our research were provided.  相似文献   

This paper forms part of an exploration of assessment on one part‐time higher education (HE) course: an in‐service, professional qualification for teachers and trainers in the learning and skills sector which is delivered on a franchise basis across a network of further education colleges in the north of England. This paper proposes that the validity and reliability of portfolio‐based assessment, a key component of many HE programmes in addition to the course being researched here, is contestable. Analysis of the processes of compiling portfolios for assessment, through the conceptual framework of the New Literacy Studies, suggests that the ways in which portfolios are assessed and the ways in which the crucial requisites of validity and reliability are assigned to them, mask complexities and contradictions in their creation by the student. This paper argues for a new, critical analysis of portfolio production and raises a number of questions about the validity, reliability and authenticity of the assessment process that the portfolios reify.  相似文献   

Peer assessment of long written tasks poses particular problems as these tasks typically involve complex learning and solving ill‐structured problems which require divergent responses. Marking reliability of this kind of writing task is difficult to achieve. The author illustrates this through an evaluation of two implementations of peer assessment, involving 81 students, in a UK university. In these implementations, all peer assessor grades were returned to students (not just mean grades). In this way students were exposed to subjectivity in marking. The implementations were evaluated through questionnaires, focus groups, observations of lectures and tutor interview. While students reported a better understanding of quality in student writing as a result of their experience, many complained that peer assessors’ marks were not ‘fair’. The article draws on recent research on the reliability of tutor marking to argue that marking judgements are subjective and that peer assessment offers the opportunity to explore subjectivity in marking, creating an opportunity for dialogue between tutors and students.  相似文献   

This article describes the development of a Web-based instrument that is part of a strategic planning initiative in technology in K–12 schools in Nebraska. The instrument provides rubrics for self-assessment of essential conditions necessary for integrating and adopting of technology. Essential conditions were defined by an extended panel of educators from across the state. The rubric examines the areas of (a) technology administration and support, (b) technology capacity, (c) educator competencies and professional development, (d) learners and learning, and (e) accountability. Each area is assessed by four to seven items that are rated using explicitly described criteria. The Web-based system allows schools to complete this rubric as part of the needs assessment process and make comparisons on their profile from year to year and relative to a statewide composite profile. Based on data from 2005 and 2006, reliability scores (Cronbach's alpha) for subscales ranged from .68 to .82. Reliability for the entire scale was .92. Examination of data over the first two years of implementation showed significant year-to-year positive mean differences in subscale scores, indicating that the instrument was sensitive to changing conditions. Effect sizes were small but acceptable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号