首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
评分标准在写作测试中非常重要,使用不同的评分方法会影响评卷者的评分行为。研究显示,虽然整体法和分析法两种英语写作评分方法都可靠,但是在两种评分中,评卷者的严厉程度以及考生的写作成绩发生很大变化。总体上,整体法评分中,评卷者的严厉程度趋于一致,接近理想值;分析法评分中,考生的写作成绩更高,同时评卷者的严厉程度也存在显著差异。因而,在决定考生前途命运的重大考试中,整体评分法更受推崇。  相似文献   

2.
Martin   《Assessing Writing》2009,14(2):88-115
The demand for valid and reliable methods of assessing second and foreign language writing has grown in significance in recent years. One such method is the timed writing test which has a central place in many testing contexts internationally. The reliability of this test method is heavily influenced by the scoring procedures, including the rating scale to be used and the success with which raters can apply the scale. Reliability is crucial because important decisions and inferences about test takers are often made on the basis of test scores. Determining the reliability of the scoring procedure frequently involves examining the consistency with which raters assign scores. This article presents an analysis of the rating of two sets of timed tests written by intermediate level learners of German as a foreign language (n = 47) by two independent raters who used a newly developed detailed scoring rubric containing several categories. The article discusses how the rubric was developed to reflect a particular construct of writing proficiency. Implications for the reliability of the scoring procedure are explored, and considerations for more extensive cross-language research are discussed.  相似文献   

3.
4.
Many personnel committees at colleges and universities in the USA use student evaluation of faculty instruction to make decisions regarding tenure, promotion, merit pay or faculty professional development. This study examines the construct validity and internal consistency reliability of the student evaluation of instruction (SEI) used at a large mid‐western university in the USA for both administrative and instructional purposes. The sample consisted of 73,500 completed SEIs for undergraduate students who self‐reported as freshman, sophomore, junior or senior. Confirmatory factor analysis via structural equation modelling was used to explore the construct validity of the SEI instrument. The internal consistency of students' ratings was reported to provide reliability evidence. The results of this study showed that the model fits the data for the sample. The significance of this study as well as areas for further research are discussed.  相似文献   

5.
This paper forms part of an exploration of assessment on one part‐time higher education (HE) course: an in‐service, professional qualification for teachers and trainers in the learning and skills sector which is delivered on a franchise basis across a network of further education colleges in the north of England. This paper proposes that the validity and reliability of portfolio‐based assessment, a key component of many HE programmes in addition to the course being researched here, is contestable. Analysis of the processes of compiling portfolios for assessment, through the conceptual framework of the New Literacy Studies, suggests that the ways in which portfolios are assessed and the ways in which the crucial requisites of validity and reliability are assigned to them, mask complexities and contradictions in their creation by the student. This paper argues for a new, critical analysis of portfolio production and raises a number of questions about the validity, reliability and authenticity of the assessment process that the portfolios reify.  相似文献   

6.
Several benefits of using scoring rubrics in performance assessments have been proposed, such as increased consistency of scoring, the possibility to facilitate valid judgment of complex competencies, and promotion of learning. This paper investigates whether evidence for these claims can be found in the research literature. Several databases were searched for empirical research on rubrics, resulting in a total of 75 studies relevant for this review. Conclusions are that: (1) the reliable scoring of performance assessments can be enhanced by the use of rubrics, especially if they are analytic, topic-specific, and complemented with exemplars and/or rater training; (2) rubrics do not facilitate valid judgment of performance assessments per se. However, valid assessment could be facilitated by using a more comprehensive framework of validity when validating the rubric; (3) rubrics seem to have the potential of promoting learning and/or improve instruction. The main reason for this potential lies in the fact that rubrics make expectations and criteria explicit, which also facilitates feedback and self-assessment.  相似文献   

7.
Indeterminacy in the use of preset criteria for assessment and grading   总被引:1,自引:1,他引:0  
When assessment tasks are set for students in universities and colleges, a common practice is to advise them of the criteria that will be used for grading their responses. Various schemes for using multiple criteria have been widely advocated in the literature. Each scheme is designed to offer clear benefits for students. Breaking down holistic judgments into more manageable parts is seen as a way to increase openness for students and achieve more objectivity in grading. However, such approaches do not adequately represent the full complexity of multi‐criterion qualitative judgments, and can lead to distorted grading decisions. Six anomalies in the ways assessors approach the grading task are identified, together with several likely contributing factors. Overall, the conclusion is that explicit grading models do not have as strong a theoretical foundation as is commonly supposed, and that holistic appraisal merits further investigation.  相似文献   

8.
为了解作文智能评阅系统在教与学中的应用效果,本文就此对使用了一年此系统的师生展开了问卷调查,以期了解作文智能评阅系统在网上按句评价、作文评分、学习诊断与预警、师生应用等方面的整体应用效果,为后续分析研究教学应用中遇到的问题及可能的对策提供实践依据。为了提高其信度和效度,必须训练教师和学生有效地使用作文智能评阅系统。  相似文献   

9.
Using generalizability (G-) theory and rater interviews as research methods, this study examined the impact of the current scoring system of the CET-4 (College English Test Band 4, a high-stakes national standardized EFL assessment in China) writing on its score variability and reliability. One hundred and twenty CET-4 essays written by 60 non-English major undergraduate students at one Chinese university were scored holistically by 35 experienced CET-4 raters using the authentic CET-4 scoring rubric. Ten purposively selected raters were further interviewed for their views on how the current scoring system could impact its score variability and reliability. The G-theory results indicated that the current single-task and single-rater holistic scoring system would not be able to yield acceptable generalizability and dependability coefficients. The rater interview results supported the quantitative findings. Important implications for the CET-4 writing assessment policy in China are discussed.  相似文献   

10.
英语作文自动评分及其效度、信度与可操作性探讨   总被引:2,自引:0,他引:2  
评述国内外作文自动评分系统,并依据英语作文测试中的信度、效度和实践可操作性对其进行分析。探讨国内英语作文自动评阅系统的发展,在肯定其优点的同时,指出和分析其中的问题和不足,并提出相应之对策,以期为我国英语作文自动评阅系统研发提供借鉴和启迪。  相似文献   

11.
ABSTRACT

This paper assesses the reliability and validity of the teacher-completed Pupil Behaviour Questionnaire (PBQ), by comparing it to the already extensively validated teacher-completed Strengths and Difficulties Questionnaire (SDQ). Participants included 2074 primary school children participating in a universal school-based trial and 41 vulnerable children who were taking part in a study exploring the impact of exclusion from school. Exploratory factor analysis results (first factor accounts for 80.8% of the variation in the items) and the high Cronbach’s alpha value of 0.85 indicate that the PBQ consists of one substantive factor/dimension. Strong correlations between the total PBQ score and the conduct sub-scale (Spearman’s correlation coefficient (rs) = 0.67) and total difficulties score (rs = 0.59) of the SDQ indicate convergent validity. This study suggests that the PBQ is a reliable measure, and provides some evidence of validity. Further work is needed to test the PBQ in an older, more diverse populations and to measure sensitivity to change.  相似文献   

12.
Namibia has been reported to be one of the countries with the highest unemployment rates. In this work, the reliability and validity of the self-assessment instrument used to measure competencies of graduates in Namibia were assessed using exploratory factor analysis (EFA) and second-order confirmatory factor analysis (CFA). The EFA results demonstrated that the twenty indicators can be categorized into five factors, namely, “management and resilience”, “professional and communication”, “teamwork and critical thinking”, “self-control”, and “achievement motive”. The CFA results showed that all of the factors and indicators are highly reliable with good construct validity. Students and graduates could employ this validated self-assessment instrument to assess or diagnose a pattern of strengths and weaknesses in their own competencies and provide themselves with a realistic and objective estimate of their employability, as well as help them increase effectiveness in their workplace.  相似文献   

13.
The Devereux Early Childhood Assessment (DECA) is a social-emotional assessment widely used by early childhood educational programs to inform early identification and intervention efforts. However, its construct validity is not well-established in independent samples of children from low-income backgrounds. We examined the construct validity of the teacher report of the DECA using a series of confirmatory factor analyses, exploratory factor analyses, and the Rasch partial credit model in a large sample of culturally and linguistically diverse Head Start children (N = 5,197). Findings provided some evidence for consistency in the factor structure of the three Protective Factors subscales (Initiative, Self-Control, and Attachment); however, the factor structure of the Behavioral Concerns subscale was not replicated in our sample and demonstrated poor fit to these data. Findings suggested that the 10 items of the published Behavioral Concerns subscale did not comprise a unidimensional construct, but rather, were better represented by two factors (externalizing and internalizing behavior). The use of the total Behavioral Concerns score as a screening tool to identify emotional and behavioral problems in diverse samples of preschool children from low-income backgrounds was not supported, especially for internalizing behavior. Implications for the consequential validity of the DECA for use as a screening tool in early childhood programs serving diverse populations of children and directions for future research are discussed.  相似文献   

14.
15.
论文评审是学位论文质量评价的重要环节。为了使评审结果能真实地反映博士学位论文的质量和水平,必须首先保证论文评审指标体系的准确性、可靠性和有效性。信度和效度分析是验证指标体系准确性、可靠性和有效性的重要方法。文章利用北京师范大学五年博士学位论文匿名评审全数据量化结果,对评审指标体系的信度和效度进行实证研究。结果表明,目前广泛采用的评审指标体系具有良好的信度和效度,能够真实地反映博士学位论文的质量和水平。  相似文献   

16.
The purpose of this paper is to provide a proof of concept of a collaborative peer-, self- and lecturer assessment processes. The research presented here is part of an ongoing study on self- and peer assessments in higher education. The authentic assessment for sustainable learning (AASL) model is evaluated in terms of the correlations between sets of marks. The article provides an explanation of the assessment process, and analyses sets of marks as a means of justifying the validity of the process. The results suggest that students, even those with no prior experience in peer- or self-evaluation, in their first year of tertiary study, under the right conditions, are able to accurately judge their own work and make reasonably accurate judgements of the work of their peers. While previous studies have expounded the benefits of self- and peer assessments in tertiary study, undertaking a prescribed process, such as AASL, has a further implication in allowing others to replicate the process with reasonable assuredness of the validity of the process across various fields of study.  相似文献   

17.
Construct validity of peer assessment (PA) is important for PA application, yet difficult to achieve. The present study investigated the impact of an assessment rubric and friendship between the assessor and assessee on construct validity of PA. Two-hundred nine bachelor students participated: half of them assessed a peer's concept map with a rubric whereas the other half did not use a rubric. The results revealed a substantial reliability and construct validity for PA. All students over-score their peers’ performance, but students using a rubric were more valid. Moreover, when using a rubric a high level of friendship between assessor and assessee resulted in more over-scoring. Use of a rubric resulted in higher quality concept maps for peer and expert ratings.  相似文献   

18.
Two studies examined the discriminant and incremental validity of self-concept and academic self-efficacy. Study 1, which meta-analysed 64 studies comprising 74 independent samples (N?=?24,773), found a strong mean correlation of .43 between self-concept and academic self-efficacy. The domains of self-concept and self-efficacy, and the domain matching between them, moderate the strength of the correlation between self-concept and academic self-efficacy. Global self-concept was associated with weaker correlations than were academic and subject-specific self-concept. Academic self-efficacy had higher incremental validity than self-concept. Study 2, which examined data-sets from Programme for International Student Assessment 2000, 2003 and 2006, found that the mean correlation ranged from .31 to 54. Self-concept sometimes had higher incremental validity than academic self-efficacy. The higher incremental validity of self-concept may result from the wording and domain of self-concept measure as well as specificity matching between self-concept and academic achievement.  相似文献   

19.
计算机智能辅助评分系统定标集选取和优化方法研究   总被引:2,自引:0,他引:2  
在计算机智能评分研究中,选取定标样本对建立评分模型至关重要。通过对不同定标集人机评分的对比研究,提出“专家随机抽取+智能挑选样卷+聚类分段补充”的定标集选取方法。这种方法提升了评分模型对于各分数段的建模能力,符合高考等考试环境下考生成绩呈正态分布的特点,拓展了对专家评分和阅卷教师评分的综合学习能力,使得计算机智能辅助评分系统能够通过深度学习的方法,更加全面地理解和掌握评分标准。  相似文献   

20.
A comprehensive research base exists concerning the congruence between parents’ and teachers’ ratings of the behavior of typically developing young children. However, little research has been conducted regarding the degree to which parents’ and teachers’ behavioral ratings of young children with disabilities are congruent. Additionally, previous research has not always correctly proportioned the variance to that between and within classrooms. The purpose of this study was to examine congruence (using hierarchical linear modeling) at the classroom level, rather than the individual student-level, between parents’ and teachers’ ratings of young children's social skills and problem behaviors. We also examined the potential impact of selected family and child demographic variables, including disability, on this congruence. Consistent with other researchers, we found moderate levels of congruence for children's social skills (as framed by strengths-based statements) and low levels of congruence for problem behaviors (as described using deficit-based terminology). Parents’ and teachers’ congruence was higher when rating the social skills of young children with disabilities as compared to young children without disabilities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号