首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
本文在回顾、分析国内外独立型、综合型等各类英语写作测试评分标准的基础上,探索构建写作评分标准的一般规律,尝试设计英语写作测试评分标准模型。为考查该模型的有效性及可操作性,本文以概要写作为例,根据该模型设计相应评分标准,并采用多层面Rasch模型进行效度验证。结果表明:评分标准区分度和效度较好,评分员与评分标准间存在显著偏性交互作用;个别分数段的使用存在非拟合现象。最后,根据效度验证结果对评分标准进行了针对性修改。总体而言,评分标准模型具有较好的效度和一定的推广价值。  相似文献   

2.
评分标准对于教学具有正面或者负面的反拨作用。文章利用雅思作文测试详细的评分细则来弥补国内作文标准的不足。通过对该评分标准进行深入细致的解读,分析了雅思测试对写作者潜在的要求,并在此基础上对大学英语写作教学提出了一些宏观可行的建议。  相似文献   

3.
本文在测试使用论证的框架下,从译文的语言适切性对翻译评价的影响、写作新评分标准与效度研究及写作测试公正性与效度研究三个方面探讨了TEM-8翻译与写作项目的效度问题,对改善测试效度有极其重要的意义。  相似文献   

4.
《现代教育技术》2015,(8):60-66
分项式评分标准对英语写作教学的正面作用及缺陷均受到了研究者的关注,而在诊断性评估中使用何种评分维度也一直是困扰学界的主要问题。文章基于两个诊断性写作测试网站对大学生写作能力的评分,分析了27名实验组学生和90名对照组学生使用不同分项式评分标准之后的成绩。通过数据采集和分析,并借助写作教学、写作测试的理论,文章发现:分项式评分标准适合大学英语写作教学的诊断性评估,但是评分标准的呈现方式以及成绩的呈现方式都影响写作能力的提高;同时,分项式评分标准对学习者写作能力不同方面的影响受写作教学时间长短的影响。  相似文献   

5.
房雅琨 《现代英语》2022,(2):107-110
写作评分量表,有时也被称为写作评分标准,体现了写作测试的构念,可以在学习者的写作表现与语言能力之间建立重要的联系.文章对英语写作评分量表的几个关键问题进行了梳理和讨论,分析了评分量表的使用对象、目的和功能,评分量表的四种类型及各自的特点,量表设计的方法及典型案例,以期为我国英语写作测试评分量表的研究及实践提供参考.  相似文献   

6.
本文采用对比研究实验分析嵌入式评分标准对考生写作行为的影响,运用统计软件SPSS13.0进行独立样本t检验。研究结果表明,嵌入式评分标准能够加强考生对出题者意图的理解,写出符合写作要求的作文,但只对语言能力水平在一定阈值内的学生发生作用。这一结果丰富了Bachman和Palmer关于影响测试行为因素及途径的图式,也使出题者与考生的沟通更加具体直接,使写作考试更加人性化。  相似文献   

7.
测试是教学的重要一环,好的测试对教学起着良好的督促作用,试题的题型设计关系着测试对教学的反拨作用;科学的评分标准对于多少带有主观评分题型的写作测试来说,也是很重要的,直接影响下一阶段或下一轮的教学,因此,值得关注和研究。  相似文献   

8.
英语口语测试评分的关键在于保证评分信度。文章总结了现行直觉型评分标准受抨击的几个原因,着重分析了国外的两种实践性评分标准:Fu lcher的流利度评分标准和Upshur和Turner的二选一界限定义评分标准(EBBs)。把Fu lcher的流利度评分标准和英语专业四级考试口试(TEM-4SET)的评分标准进行了对比,讨论了其在国内大型口试中付诸实践的可行性。最后提出了实践性评分标准在我国英语专业口语教学和测试中实施的优势。  相似文献   

9.
随着外语教学与评估方法的不断发展,现行俄语写作评估标准有待进一步完善。对比俄语写作国内外的评分标准完善国内俄语写作的评分方法,以便更科学、全面地评价俄语写作,促进教学方法的多样化。  相似文献   

10.
提高开放题评分的一致性和公平性是目前教育测试中的难点问题。文章就开放题评分标准的结构和计分方法、评分标准的编制方法、评分的基本原则、评分质量控制等方面,对PISA开放题评分与国内高考开放题评分进行比较,在此基础上提出对开放题评分标准和评分方法的建议。  相似文献   

11.
文章针对目前网阅环境下作文"一评"定分评分方法的缺陷,提出了将"三评法"应用于作文评分中。结果表明,"一评法"下,评分员间一致性不够理想,存在显著性差异。"三评法"在一定程度上降低了评分误差,确保了阅卷质量。但这种方法在实施过程中也要注意避免三评人员的求稳心理,以确保该方法得到科学合理的使用。对于该方法能否投入到大规模作文网上评分中,还有待进一步研究。  相似文献   

12.
Martin   《Assessing Writing》2009,14(2):88-115
The demand for valid and reliable methods of assessing second and foreign language writing has grown in significance in recent years. One such method is the timed writing test which has a central place in many testing contexts internationally. The reliability of this test method is heavily influenced by the scoring procedures, including the rating scale to be used and the success with which raters can apply the scale. Reliability is crucial because important decisions and inferences about test takers are often made on the basis of test scores. Determining the reliability of the scoring procedure frequently involves examining the consistency with which raters assign scores. This article presents an analysis of the rating of two sets of timed tests written by intermediate level learners of German as a foreign language (n = 47) by two independent raters who used a newly developed detailed scoring rubric containing several categories. The article discusses how the rubric was developed to reflect a particular construct of writing proficiency. Implications for the reliability of the scoring procedure are explored, and considerations for more extensive cross-language research are discussed.  相似文献   

13.
A two-stage process by which a holistic rubric is applied to the assessment of open-ended items, such as writing samples, is defined. The first stage involves scoring a performance by the assignment of an integer rating that is congruent with the proficiency level that is exhibited in the performance. The second stage is the subsequent assignment by the rater of an augmentation that indicates whether or not the writing competency reflected in the paper is a bit higher or lower than the competency level reflected in the benchmark paper for the given proficiency level. If the rater feels that the paper represents benchmark proficiency for the given level, no augmentation is assigned to the rating. The results of this study indicate that the use of rating augmentation can improve the inter-rater reliability of holistic assessments, as indicated by generalizability phi coefficients, correlation coefficients, and percent agreement indices. Implications and suggestions for follow-up research are discussed.  相似文献   

14.
主观考试采用评分员进行主观评分,由于评分一致性不高,缺乏信度,测量学界一直在努力探索提高主观评分信度的办法。本文用Longford方法对参加HSK[高等]作文考试评分的异常评分员作了一次实证检验。结果证明,该方法对检验大规模标准化主观考试评分员差异确实有效。  相似文献   

15.
网络教育内容分级标准研究   总被引:2,自引:0,他引:2  
网络教育内容分级标准研究是全国信息技术标准化技术委员会教育技术分技术委员会跟踪研究课题之一。笔者在对比国际上影视分级标准与因特网分级标准的基础上,提出适应中国国情的网络教育内容分级标准(CHERS: Chinese e-learning content rating standard),并对其二维特性进行了具体阐述。  相似文献   

16.
The use of evidence to guide policy and practice in education (Cooper, Levin, & Campbell, 2009) has included an increased emphasis on constructed-response items, such as essays and portfolios. Because assessments that go beyond selected-response items and incorporate constructed-response items are rater-mediated (Engelhard, 2002, Engelhard, 2013), it is necessary to develop evidence-based indices of quality for the rating processes used to evaluate student performances. This study proposes a set of criteria for evaluating the quality of ratings based on the concepts of measurement invariance and accuracy within the context of a large-scale writing assessment. Two measurement models are used to explore indices of quality for raters and ratings: the first model provides evidence for the invariance of ratings, and the second model provides evidence for rater accuracy. Rating quality is examined within four writing domains from an analytic rubric. Further, this study explores the alignment between indices of rating quality based on these invariance and accuracy models within each of the four domains of writing. Major findings suggest that rating quality varies across analytic rubric domains, and that there is some correspondence between indices of rating quality based on the invariance and accuracy models. Implications for research and practice are discussed.  相似文献   

17.
主观题评分标准研究   总被引:1,自引:0,他引:1  
本文以2006年上海市高考政治学科论述题评分标准为例,从三个方面研究如何评价主观题评分标准的优劣,即每个评分项是否具有相对独立性;根据若干评分项的结果是否能够推测出考生的综合论述的能力;每个评分项等第划分是否合理。因子分析表明该主观题四个评分项具有单维性,一个因子可以解释为考生的综合论述能力。相关分析表明四个评分项均具有相对独立性,对推测考生的综合论述能力起到了彼此独立的作用。Rasch评分量表模型分析显示,各评分项等级划分基本合理,但个别等级出现信息量不足,在此基础上,提出了改进评分标准的若干建议。  相似文献   

18.
The paper provides (1) a teacher-administered rating instrument for inattention without confounding the rating with hyperactivity and conduct disorder, and (2) evidence that the ratings correlate with the scores obtained from cognitive tests of attention. In Study I, the first objective was to investigate the construct validity and the inter-rater reliability of the Attention Checklist (ACL) by factor analysing the teacher ratings of 110 Grade 4 children, obtained by using the ACL. The second objective was to investigate the predictive validity of the ACL by examining the relationship between the scores obtained for the participants from teachers' ratings using the ACL and the scores obtained by participants in the lab-type attention tests. The results of factor analysis showed that a single factor labelled ‘inattention’ underlies the 12 items in the ACL. Examining the differences in performance on attention tests, the ‘low attention’ children as rated by the teachers on the ACL scored lower than the ‘high attention’ children on the objective tests of attention. These findings were replicated in Study II, which was conducted to test further the construct validity and predictive validity of the ACL. This time, only those two tests (Auditory Attention and Visual Attention) that had shown relatively poor discrimination between the high and low attention groups in Study I were, again, administered to another cohort of 97 Grade 4 children, as it was our intention to further challenge the reliability of the ACL. Overall, the results of both studies suggest that comprehensive assessment of attention skills should include both ACL and objective measures of selective attention.  相似文献   

19.
随着新课程改革的不断深入,教学理念逐步更新,学生的英语水平也在逐渐提高,但沿用多年的高考英语书面表达的评分标准并没有与时俱进,已经不能完全适应英语教学改革的要求。笔者认为,与《课程标准》相对照,它存在对学生的书面表达能力要求偏低的问题;与托福等考试的写作评分标准相比较,其整体评分方式不确定度相对较大,分项式描述不尽合理。针对上述问题,本文提出了"改良的整体评分法"的建议。  相似文献   

20.
国内外写作评分量表的对比研究   总被引:1,自引:0,他引:1  
陈睿 《考试研究》2011,(6):59-67
国外考试项目的写作通常采用小评分量表综合评分法,国内则采用大评分量表综合评分或分项评分法。国外写作评分量表的描述具体、详细,层次清楚,各评分等级间的差别可鉴别,便于评卷者操作。与小评分量表相比,评卷者在大评分量表下不能使用全距分值,容易给出趋中分数,评分员间的评分一致性较差。据此,得出小评分量表下"整体描述+分项具体描述"的综合评分法较大评分量表的综合评分法准确度高,评卷者易于掌握,评卷效率高,评分误差小,考试的公平性也可以得到有效保障。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号