共查询到20条相似文献,搜索用时 0 毫秒
1.
论英语口语考试的评分误差 总被引:1,自引:0,他引:1
口语考试的评分是评分员基于评分标准对语言产出的认知处理过程,处理的目的就是解释考生之间的分数差异(score vari-ance)。用于解释分数差异的变量包括构念相关变量(construct-rele-vant variables)和构念不相关变量(construct-irrelevant variables)。如果构念不相关变量发生作用,那么评分就产生误差。考试误差可区分为系统性误差(systematic error)和随机性误差(randomerror)。随机性误差是评分误差控制的重点内容。口语考试评分误差的主要表现形式包括评分员的个性差异、回归均值趋势和假正态分布。我们可以通过分数差异分布和回归系数等统计手段验证口语考试评分误差的大小程度。本文还讨论了口语考试评分误差控制的目标、原则和方法。评估误差控制的目的就是最大化构念相关变量的作用,最小化构念不相关变量的影响作用;这就要求评分员在评分过程中坚持一致性、完整性和独立性三条基本原则;在手段的使用方面,口语考试的评分误差控制主要包括管理手段、技术手段和统计手段等。 相似文献
2.
3.
通过有声思维实验方法并辅以刺激回忆,收集四名不同性格倾向的评分员在配对口语考试评分时进行的思维报告数据,定性分析结果表明:在实际评分中,评分员对评分量表的理解和使用存在很大的差异性,具体表现在:(1)外向的评分员在评分过程中,表现的比内向的评分员更为宽容;(2)内向的评分员更多地关注评分量表中的各项具体指标和标准,而外向的评分员强调任务的完成状况和考生之间的比较、交流,和互动;(3)外向的评分员比内向的评分员更少地依赖评分量表,更多地使用非语言的特征。本研究结果对考试评分标准的修订和评分员培训均有启示。 相似文献
4.
闫素萍 《忻州师范学院学报》2005,21(6):99-104
文章结合现代测试理论和相关评估标准,对一份大学一年级学生使用的教师自制期末英语阅读课成就试卷进行了调查分析。通过被试学生考试分数的详细描述和对比研究,得出结论:该试卷在框架设计,组成元素,难度系数,相关系数等方面还存有缺陷。为了更加准确的反映教学实际效果并发挥语言测试的积极反拨作用,笔者提出自己的一些看法,希望教师在今后研发试卷过程中能够保证试题较高的信度和效度。 相似文献
5.
Fouad Abd-El-Khalick;Ryan Summers;Jeanne L. Brunner;Jeremy Belarmino;John Myers; 《科学教学研究杂志》2024,61(7):1641-1688
We report on the development of a rubric to reliably qualify and score responses to the Views of Nature of Science Questionnaire (VNOS): The VNOS Analysis and Scoring Rubric (VAScoR). The VAScoR is designed to (a) provide systematic guidance for the qualitative analysis, and score assignment to nuanced categories, of VNOS responses, (b) explicitly scaffold qualitative inferencing and standardize score assignment to substantially lessen the burden of, and variance in, analyzing and scoring the VNOS, and (c) improve the viability and meaningfulness of cross-study comparisons drawing on VNOS data. The rubric adopted the VNOS's consensus NOS framework and further delineated core and related elements across 10 target NOS aspects. The VAScoR's reliability was examined in two studies that drew on VNOS questionnaires completed by 185 preservice secondary science teachers (58% female; 126 undergraduate and 59 graduate students) enrolled over several years in a combined undergraduate and graduate licensure program in a large U.S. Midwestern university. In Study I, VAScoR analyses of 86 VNOS questionnaires undertaken by a single author were used to examine the rubric's intra-rater reliability, which resulted in a robust Cronbach's alpha value of 0.81. In Study II, analyses by four authors of a randomly generated, overlapping set of 18 questionnaires were used to examine inter-rater reliability, which was supported with substantial consensus among raters as indicated by a Cohen's kappa of 0.71. Further evidence for the VAScoR's inter-rater reliability was indicated by moderate to strong consistency among four raters with an overall Pearson's correlation coefficient of 0.82, and coefficient values ranging from 0.77 to 0.89 for six possible rater pairings. 相似文献
6.
An assumption that is fundamental to the scoring of student-constructed responses (e.g., essays) is the ability of raters to focus on the response characteristics of interest rather than on other features. A common example, and the focus of this study, is the ability of raters to score a response based on the content achievement it demonstrates independent of the quality with which it is expressed. Previously scored responses from a large-scale assessment in which trained scorers rated exclusively constructed-response formats were altered to enhance or degrade the quality of the writing, and scores that resulted from the altered responses were compared with the original scores. Statistically significant differences in favor of the better-writing condition were found in all six content areas. However, the effect sizes were very small in mathematics, reading, science, and social studies items. They were relatively large for items in writing and language usage (mechanics). It was concluded from the last two content areas that the manipulation was successful and from the first four that trained scorers are reasonably well able to differentiate writing quality from other achievement constructs in rating student responses. 相似文献
7.
对Stephen B.McCamey1996年修订完成的《学习障碍评价量表》(学校版)进行了修订。中文版量表共85个项目,包括7个分量表:听、思考、说、阅读、书写/写作、拼写和数学运算。对416名小学二至五年级学生的测量表明:(1)项目的回答模式合理; (2)该量表具有较高的内部一致性系数和重测信度系数;(3)该量表具有较好的结构效度、效标关联效度和内容效度。 相似文献
8.
The undergraduate electrical engineering program at The Johns Hopkins University has undergone extensive revision. The most striking revision has been in the laboratory program. Laboratory courses which are distinct from the lecture courses have been developed. These laboratory courses embrace the fields of basic and advanced electrical measurements, transducers, passive circuits, active networks, communications, microwaves, materials, computers, servomechanisms, and energy conversion. The experiments in each one of these fields are designed to give the student insight into both the basic and advanced concepts involved. The sequence of presentation of the experiments is chosen to allow the most complete coverage of a subject as possible, based on the order in which the electrical engineering lecture courses are taken. The use of laboratory manuals, notebooks, reports, and examinations has been given careful thought and some significant ideas have been evolved with regard to their use in establishing a successful laboratory course. 相似文献
9.
张坤 《南阳师范学院学报》2012,11(4):94-96
《幽兰逢春》是浙派笛子大师赵松庭先生的最重要的作品之一,在众多的习笛者中,是演奏学习最多的一首。作者从周恩来总理赞美昆曲的一句话联想到自己的艺术生涯和生活经历,以兰花逢春的比喻来抒发自己对总理的怀念和对美好生活的向往。借昆曲曲牌《二郎神》为音乐动机,作者从曲式结构,调式和声,主题题材以及旋律等方面进行发展,从而创作出了一部感人至深的音乐作品。 相似文献
10.
本文针对"轮机英语"听力与会话评估的有效与否,采取了测试学信度和效度的方法进行分析。分析结果表明,评估体现了信度和效度的统一,具有一定的科学性和可操作性,但是也存在明显的不足,需要采取有效的方式进行完善。 相似文献
11.
12.
13.
奥林匹克主义浅析 总被引:1,自引:0,他引:1
田路也 《青岛职业技术学院学报》2004,17(2):26-29
现代奥林匹克运动今天几乎已经成为了最伟大的一种社会力量,奥林匹克运动会也已成为了参与国家和地区众多、具有巨大吸引力、穿透力和凝聚力的一项全球性活动。作为奥林匹克运动和奥林匹克运动会指导思想的奥林匹克主义,其缘起、主旨和本质是什么呢?本文对此作了浅显的分析。 相似文献
14.
《Journal of College Student Psychotherapy》2013,27(4):261-283
This study provides information about students seeking counseling (N = 3,844) at 9 institutions of higher education. The K-PIRS, an empirically validated measure, was used to assess 7 problem areas (mood difficulties, learning problems, food concerns, interpersonal conflicts, career uncertainties, self-harm indicators, and addiction issues). Forty-two percent of students presented with multiple problems, and most reported that their concerns interfered with their academic (87%) and social (90%) functioning. A majority of students (61%) were in a stage of contemplation when seeking counseling. Only 24% were in a stage of action. There were small differences in problem scores by participants' gender, ethnicity, year in school, type of residence, work status, previous treatment, and use of psychiatric medication. Implications are discussed for counseling practitioners working with college students. 相似文献
15.
In writing assessment, the inconsistency of teachers’ scorings is among the frequently reported concerns regarding the validity and the reliability of assessment. The study aimed to find out to what extent participating in a community of assessment practice (CAP) can impact the discrepancies among raters’ scorings. Adopting one group pretest-posttest design, patterns in the teachers’ scoring judgments were explored based on both quantitative and qualitative data. The results indicate significant increase in the degrees of agreement in the teachers’ differential scorings showing changes in their severity tendencies for structural variety, lexical accuracy, organization and mechanics criteria while their scoring judgements on structural accuracy, task achievement, and lexical variety criteria had low levels of agreement. 相似文献
16.
余方敏 《宁波大学学报(教育科学版)》2006,28(2):77-79,94
英语口语测试评分的关键在于保证评分信度。文章总结了现行直觉型评分标准受抨击的几个原因,着重分析了国外的两种实践性评分标准:Fu lcher的流利度评分标准和Upshur和Turner的二选一界限定义评分标准(EBBs)。把Fu lcher的流利度评分标准和英语专业四级考试口试(TEM-4SET)的评分标准进行了对比,讨论了其在国内大型口试中付诸实践的可行性。最后提出了实践性评分标准在我国英语专业口语教学和测试中实施的优势。 相似文献
17.
刘东燕 《齐齐哈尔师范高等专科学校学报》2015,(1)
本文介绍了基于体验英语写作教学资源平台的多元评分软件易格(e-scorer)。易格通过“初始评分”、“结构评分”、“主题评分”、“综合评分”四个层级的评判,协助大学英语教师完成海量作文的半自动化评阅工作,实验结果表明易格评分结果和人工评分结果呈高相关,评分信度较高。 相似文献
18.
19.
大学英语写作评分方法对评分者严厉程度的影响——整体评分法和分析评分法的对比分析 总被引:1,自引:0,他引:1
贺满足 《湖南第一师范学报》2006,6(4):59-61,66
评分标准在写作测试中非常重要,使用不同的评分方法会影响评卷者的评分行为。研究显示,虽然整体法和分析法两种英语写作评分方法都可靠,但是在两种评分中,评卷者的严厉程度以及考生的写作成绩发生很大变化。总体上,整体法评分中,评卷者的严厉程度趋于一致,接近理想值;分析法评分中,考生的写作成绩更高,同时评卷者的严厉程度也存在显著差异。因而,在决定考生前途命运的重大考试中,整体评分法更受推崇。 相似文献
20.