首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Individual person fit analyses provide important information regarding the validity of test score inferences for an individual test taker. In this study, we use data from an undergraduate statistics test (N = 1135) to illustrate a two-step method that researchers and practitioners can use to examine individual person fit. First, person fit is examined numerically with several indices based on the Rasch model (i.e., Infit, Outfit, and Between-Subset statistics). Second, person misfit is presented graphically with person response functions, and these person response functions are interpreted using a heuristic. Individual person fit analysis holds promise for improving score interpretation in that it may detect potential threats to validity of score inferences for some test takers. Individual person fit analysis may also highlight particular subsets of items (on which a test taker performs unexpectedly) that can be used to further contextualize her or his test performance.  相似文献   

2.
英语高考试行"一年多考"是一项了不起的进步,但多次考试之间的难度波动往往会给直接使用原始分数做招生决定带来极大的麻烦。本文探讨了稳定测验难度的三种方法:国际考试行业的标准做法、借用标准设定思想的专家评定方法,以及反向使用效度证据的小规模代表性样本试测方法。期待这些方法可以给考试一线工作者提供更多的选择。  相似文献   

3.
Remote proctoring, or monitoring test takers through internet-based, video-recording software, has become critical for maintaining test security on high-stakes assessments. The main role of remote proctors is to make judgments about test takers' behaviors and decide whether these behaviors constitute rule violations. Variability in proctor decision making, or the degree to which humans/proctors make different decisions about the same test-taking behaviors, can be problematic for both test takers and test users (e.g., universities). In this paper, we measure variability in proctor decision making over time on a high-stakes English language proficiency test. Our results show that (1) proctors systematically differ in their decision making and (2) these differences are trait-like (i.e., ranging from lenient to strict), but (3) systematic variability in decisions can be reduced. Based on these findings, we recommend that test security providers conduct regular measurements of proctors’ judgments and take actions to reduce variability in proctor decision making.  相似文献   

4.
5.
应用标准参照的理论对常模参照考试进行研究和解释,可以对考试进行更深入细致的分析。应用Rasch测量模型将考生的能力水平和不同版本的试题的考试分数转换到同一个分数系统上,可以进行不同年度间试题水平的比较和考生水平的比较。将学科能力分解为不同的维度,对各个维度的能力分成梯度等级,可以分析试卷中各试题的能力层级定位和试卷的能力结构,同时可以分析各层次考生的成分和比例。  相似文献   

6.
7.
Score reports have one or more intended audiences: the people who use the reports to make decisions about test takers, including teachers, administrators, parents and test takers. Attention to audience when designing a score report supports assessment validity by increasing the likelihood that score users will interpret and use assessment results appropriately. Although most design guidelines focus on making score reports understandable to people who are not testing professionals, audiences should be defined by more than just their lack of statistical knowledge. This paper introduces an approach to identifying important audience characteristics for designing computer-based, interactive score reports. Through three examples, we demonstrate how an audience analysis suggests a design pattern, which guides the overall design of a report, as well as design details, such as data representations and scaffolding. We conclude with a research agenda for furthering the use of audience analysis in the design of interactive score reports.  相似文献   

8.
Researchers have documented the impact of rater effects, or raters’ tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers’ achievement estimates given their response patterns, has not been investigated. In rater-mediated assessments, person fit reflects the reasonableness of rater judgments of individual test-takers’ achievement over components of the assessment. This study illustrates an approach to visualizing and evaluating person fit in assessments that involve rater judgment using rater-mediated person response functions (rm-PRFs). The rm-PRF approach allows analysts to consider the impact of rater effects on person fit in order to identify individual test-takers for whom the assessment results may not have a straightforward interpretation. A simulation study is used to evaluate the impact of rater effects on person fit. Results indicate that rater effects can compromise the interpretation and use of performance assessment results for individual test-takers. Recommendations are presented that call researchers and practitioners to supplement routine psychometric analyses for performance assessments (e.g., rater reliability checks) with rm-PRFs to identify students whose ratings may have compromised interpretations as a result of rater effects, person misfit, or both.  相似文献   

9.
高职商务英语与普通高校商务英语有显著区别。转变以终结性评价为主的高职商务英语考核体系,采取形成性评价、终结性评价与诊断性评价相结合的方式,逐步实现评价形式、评价主体、评价内容、评价标准、评价方法的多元化,对进一步促进高职商务英语教学改革有积极的现实意义。  相似文献   

10.
This article draws on current research investigating the notion of design for an unknown future. It reflects on recent thinking about the role of creativity in design practice and discusses implications for the development and assessment of creativity in the design studio. It begins with a review of literature on the issues and challenges associated with the assessment of creativity in design education. It then discusses and distinguishes three significant assessment models in design and creative arts education and emphasises the importance of opening debate on notions of creativity within the discipline. Following this, the article examines recent developments in the way that creativity is being practised, driven, fostered and implemented in contemporary design practice, and argues that these recent developments must feature in current scholarship about the development and assessment of creativity in design education. The article recommends areas for future research that pay close attention to developments in the rapidly expanding field of design practice.  相似文献   

11.
One of the substantive changes in the 2014 Standards for Educational and Psychological Testing was the elevation of fairness in testing as a foundational element of practice in addition to validity and reliability. Previous research indicates that testing practices often do not align with professional standards and guidelines. Therefore, to raise awareness of fairness concepts and principles from the 2014 Standards, this study aligned those standards with fairness practices, as documented in test manuals and on websites of 18 intelligence and achievement tests from different test publishers. A content analysis indicated that just under half of the fairness standards are frequently or occasionally practiced and those occurrences differed somewhat across tests but did not differ between intelligence and achievement tests or across publishers. To inform and encourage improvements in the future practice of the fairness standards, an evaluative framework along with example practices and related methodological scholarship is discussed.  相似文献   

12.
ABSTRACT

In recent years, there have been calls in the literature for the dominant model of feedback to shift away from the transmission of comments from marker to student, towards a more dialogic focus on student engagement and the impact of feedback on student learning. In the present study, we sought to gain insight into the extent to which such a shift is evident in practice, and how practice is shaped by national and disciplinary cultures. A total of 688 higher education staff from the UK and Australia completed a survey, in which we collected data pertaining to key influences on the design of feedback, and the extent to which emphasis is placed on student action following feedback. Our respondents reported that formal learning and development opportunities have less influence on feedback practice than informal learning and development, and prior experience. Australian respondents placed greater emphasis on student action following feedback than their counterparts in the UK, and were also more likely than UK respondents to judge the effectiveness of feedback by seeking evidence of its impact on student learning. We contextualise these findings within the context of disciplinary and career stage differences in our data. By demonstrating international differences in the adoption of learning-focused feedback practices, the findings indicate directions for the advancement of feedback research and practice in contemporary higher education.  相似文献   

13.
Background:?Although on-demand testing is being increasingly used in many areas of assessment, it has not been adopted in high stakes examinations like the General Certificate of Secondary Education (GCSE) and General Certificate of Education Advanced level (GCE A level) offered by awarding organisations (AOs) in the UK. One of the major issues with on-demand testing is that some of the methods used for maintaining the comparability of standards over time in conventional testing are no longer available and the development of new methods is required.

Purpose:?This paper proposes an item response theory (IRT) framework for implementing on-demand testing and maintaining the comparability of standards over time for general qualifications, including GCSEs and GCE A levels, in the UK and discusses procedures for its practical implementation.

Sources of evidence:?Sources of evidence include literature from the fields of on-demand testing, the design of computer-based assessment, the development of IRT, and the application of IRT in educational measurement.

Main argument:?On-demand testing presents many advantages over conventional testing. In view of the nature of general qualifications, including the use of multiple components and multiple question types, the advances made in item response modelling over the past 30 years, and the availability of complex IRT analysis software systems, coupled with increasing IRT expertise in awarding organisations, IRT models could be used to implement on-demand testing in high stakes examinations in the UK. The proposed framework represents a coherent and complete approach to maintaining standards in on-demand testing. The procedures for implementing the framework discussed in the paper could be adapted by people to suit their own needs and circumstances.

Conclusions:?The use of IRT to implement on-demand testing could prove to be one of the viable approaches to maintaining standards over time or between test sessions for UK general qualifications.  相似文献   

14.
This article has three goals. The first goal is to clarify the role that the consequences of test score use play in validity judgments by reviewing the role that modern writers on validity have ascribed for consequences in supporting validity judgments. The second goal is to summarize current views on who is responsible for collecting evidence of test score use consequences by attempting to separate the responsibilities of the test developer and the test user. The last goal is to offer a framework that attempts to prescribe the conditions under which the responsibility for collecting evidence of consequences falls to the test developer or to the test user.  相似文献   

15.
作为一种符合“认知学习观”的评价方式,真实性评价以其特有的方式对语言学习产生反拨作用。可以从真实性评价反拨作用发生机制的角度,在概括影响评价有效性的核心因素的基础上,制定出具有可操作性的评价有效性判断三重标准。  相似文献   

16.
In this digital ITEMS module, Dr. Michael Bunch provides an in-depth, step-by-step look at how standard setting is done. It does not focus on any specific procedure or methodology (e.g., modified Angoff, bookmark, and body of work) but on the practical tasks that must be completed for any standard setting activity. Dr. Bunch carries the participant through every stage of the standard setting process, from developing a plan, through preparations for standard setting, conducting standard setting, and all the follow-up activities that must occur after standard setting in order to obtain the approval of cut scores and translate those cut scores into score reports. The digital module includes a 120-page manual, various ancillary files (e.g., PowerPoint slides, Excel workbooks, sample documents, and forms), links to datasets from the book Standard Setting (Cizek & Bunch, 2007), links to final reports from four recent large-scale standard setting events, quiz questions with formative feedback, and a glossary.  相似文献   

17.
Reflecting upon the experience serving on the University of California's Standardized Testing Task Force, and drawing lessons learned, I argue that the COVID-19 pandemic has merely served to accelerate the trend of US higher education institutions moving away from current standardized tests. New educational assessments will continue to be produced. There will be strong calls for the making of completely different assessments than currently available. Educational measurement as field has much to offer and to learn.  相似文献   

18.
就现行的广东省普通高校高考体育专业体育加试田径专项、非专项及素质考试项目设置、考试方法及项目评分标准等问题,对部分中学体育教师和学生进行了调查,从中发现了评分标准和项目的设置中存在着问题,并对这些问题提出了解决的办法.  相似文献   

19.
在不同的文化语境下,人们拥有不同的思维方式、哲学观念、价值标准以及风俗习惯等,从而直接影响到自身的语言表达方式以及对特定思想观念、语言词汇甚至是事物对象的理解,在跨文化语境中采用什么样的标准进行翻译活动成为诸多人思考的一个问题。因此,笔者从选择翻译标准的角度分析了中西方翻译中的异同点以及产生根源,并提出了可行的应对措施,希望能够对当前翻译活动的顺利进行提供一定的帮助。  相似文献   

20.
本文探讨了语境因素在英语词汇测试中的重要性,从掌握词汇的标准出发,对比了离散性方法和综合型方法的反拨作用,及反映在信度和效度上的矛盾.在此基础上,本文分析了语境在词汇测试中的作用,并介绍了两种尝试将语境和词汇有机结合的方法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号