首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 156 毫秒
认知诊断是目前基础教育学业水平评价的热点研究之一.计算机自适应测验与认知诊断技术结合,发展到具有认知诊断功能的计算机自适应测验(CAT认知诊断)功能,通过CAT认知诊断方法可以在课堂教学中测试并即时反馈结果,给出学生知识掌握的诊断信息,为后续的补救教学提供了参考依据.有研究者认为,CAT认知诊断将逐渐成为基础教育学业水平评价的主要形式.  相似文献   

认知诊断是目前基础教育学业水平评价的热点研究之一。计算机自适应测验与认知诊断技术结合,发展到具有认知诊断功能的计算机自适应测验(CAT认知诊断)功能,通过CAT认知诊断方法可以在课堂教学中测试并即时反馈结果,给出学生知识掌握的诊断信息,为后续的补救教学提供了参考依据。有研究者认为,CAT认知诊断将逐渐成为基础教育学业水平评价的主要形式。  相似文献   

计算机化自适应测验(CAT)在理论与实践中得到广泛应用。目前许多CAT研究可以归纳为两种研究范式:实测作答的CAT研究范式和测验作答数据模拟的CAT研究范式。CAT模拟研究方法的步骤有模型选择、题库模拟、测试起点、选题策略、测验终止策略等。CAT模拟研究的主要趋势有:选题策略、终止策略仍然是CAT研究的重点;CAT模拟研究的设计内容更适合实际测验情况;CAT研究设计采取多因素设计;模拟结果多方面综合评价等。  相似文献   

本文介绍了5种幼儿叙事测评工具,包括麦克阿瑟系列故事测验(MSSB)、依恋故事测验(ASCT)、分离焦虑测验(SAT)、情绪认知测验(EKI)和情绪理解测验(EUPI)。这些工具主要用于考察幼儿心理社会化的发展水平,测评内容包括亲子依恋/冲突、安全和不安全依恋、分离焦虑、基本情绪或自我意识情绪理解等方面。利用叙事测评工具可以直接测评幼儿的心理特点,为儿童心理学研究者、临床心理医生和幼儿教育工作者提供了测评幼儿心理特点的新工具。  相似文献   

计算机化自适应测验(CAT)模拟是CAT研究的主要方法之一。CAT模拟结果的评价分析内容主要包括三个方面:被试能力估计与被试能力分类分析、题库试题使用情况分析和CAT测验作答过程分析。CAT模拟结果的分析模式主要分为整体分析和细化分析两种模式。本研究从测验模拟返真性能、测验准确性、题库安全性、题库使用率、测验分类效率与准确性、多测验目标约束控制的实现程度等角度概述CAT模拟结果的各类评价指标。CAT模拟结果的评价角度和评价指标需要根据CAT研究目标和测验情境要求加以确定。  相似文献   

学绩测验是经过了一段的教学或训练后,在一个比较明确的相对限定的范围内所进行的测验。要使这一测验达到科学化,最为关键的问题是提高测验的效度。所谓效度是指测验的有效性、正确性,即真实地反映出学生学习水平和教师的教学情况。下面就效度问题,谈些浅见,以求于同行。学绩测验的效度受诸多因素影响,其主要表现在命题、施测、评卷以及受测者受测时的心理状态等几个方面。其中命题与评卷是由教师直接操作。在命题中,效度取决于试题的性能、取材、区分度、难度、编排顺序等。由于教师对学科内容重  相似文献   

采用马腾斯(martens)等人编制、经我国修订的CSAI–2(1994)和SCAT(1992年)问卷对参加山东省第十二届大学生运会预、决赛的竞走运动员竞赛状态焦虑和特质焦虑进行了测量,使教练员和运动员充分了解竞走比赛中运动员竞赛状态焦虑产生变化的规律以及它对运动员临场技、战术水平发挥的影响作用,有意识地在训练和比赛中控制运动员的焦虑水平.  相似文献   

该文介绍并比较了计算机化自适应测验(computerized adaptive testing,CAT)环境中的MLE、WLE、MAP、EAP等几种常用能力估计方法的发展演变以及各自的原理与特性,并对这些能力估计方法的发展脉络及其特性做了简要总结与评价,最后展望了未来CAT中能力估计的发展趋势。  相似文献   

基于计算机的测验已逐渐普及,但不同的计算机测验形式在测量相同任务时可能会产生测验结果的偏差,从而导致教育测量与评价结果的不公平性。文章基于项目反应理论,探讨了计算机化线性测验与计算机自适应测验在测验效率、测验结果的统计学特征及其对考生个体心理特质的影响是否等效等问题,并以师范生"现代教育技术"课程为例开展了实证研究,结果显示:两种测验中考生的分数具有可比性,计算机自适应测验具有更高的测验效率与测验信度,但有无即时反馈对考生测验焦虑的影响较大;而计算机化线性测验具有更合理的内容效度,有无即时反馈对考生测验焦虑的影响较小。文章的研究不仅对教学评价中测验形式的选择是否公平合理进行了科学分析,而且为施测者根据测验场景有针对性地选择测验形式提供了理论参考。  相似文献   

一、自适应试题库系统的理论依据: 一个良好的测试系统必须以明确的教育测量理论作为理论基础。自适应测验是以近年来在网络教育领域中兴起的项目反应理论(IRT item response theory)为基础,强调测验应该自动地适应被试者的具体情况,将试题的内容、数量、难度和知识分布等因素与被试者的情况综合统筹,以受测者的回答问题的情况,经题目特征函数的运算,推测受测者的能力。流行的经典测试理论(CTT classical test theoryr)是目前教学中最常见的测试手段,适合横向的常模参照测验,支持固定试题的测验方法。基于这两种理论,该系统以自适应测验为  相似文献   

计算机辅助普通话水平测试是我国普通话水平测试历史上的里程碑,自推广以来,备受社会各界关注。文章从“机测”的现状、“机测”存在的问题、“机测”改进措施三个方面进行了系统阐述。  相似文献   

This study focused on the effects of administration mode (computer-adaptive test [CAT] versus self-adaptive test [SAT]), item-by-item answer feedback (present versus absent), and test anxiety on results obtained from computerized vocabulary tests. Examinees were assigned at random to four testing conditions (CAT with feedback, CAT without feedback, SAT with feedback, SAT without feedback). Examinees completed the Test Anxiety Inventory (Spielberger, 1980) before taking their assigned computerized tests. Results showed that the CATs were more reliable and took less time to complete than the SATs. Administration time for both the CATs and SATs was shorter when feedback was provided than when it was not, and this difference was most pronounced for examinees at medium to high levels of test anxiety. These results replicate prior findings regarding the precision and administrative efficiency of CATs and SATs but point to new possible benefits of including answer feedback on such tests.  相似文献   

随着多媒体计算机及网络技术的发展,一种将计算机技术与项目反应理论(IRT)相结合的计算机适应性测试(CAT)技术已引起人们的重视。本介绍了IRT的基本理论,并在此基础上研究了CAT系统的实现模型和利用JSP实现CAT系统的关键技术。  相似文献   

Previous research has found conflicting evidence regarding how early children can effectively use separate answer sheets with achievement tests. This study looked at the effects of separate answer sheets on the California Achievement Test (CAT) scores of third, fourth, and fifth graders. The Mathematics Computation and the Reading Comprehension subtests of the CAT were used. Seventy-one classrooms were randomly assigned to have students record their answers on either: (a) their test booklets, (b) separate answer sheets, or (c) separate answer sheets after being given training in the use of separate answer sheets. The results were consistent across both subtests and grades; no response mode treatment effect was found. Further, no evidence of a treatment by ability interaction was found, which was contrary to previously reported research. The results of this study suggest that students can, as early as grade three, effectively use separate answer sheets without prior training.  相似文献   

大学英语四、六级考试一直都在改革中发展,但近年来频发的泄题及考试中高科技作弊现象,引起人们对大学英语四六级考试的种种质疑。一个曾认可度很高的考试,其传统的考试形式却面临着越来越多的问题。2008年12月的四级考试拉开了大学英语四、六级机考改革的序幕,文章通过回顾其改革进程,分析此次改革的优势与问题所在,提出了当前的大学英语教学应充分利用网络教学、提高学生实际综合应用能力、摆脱应试教育的观点。  相似文献   

The overlap of words specifically taught in reading textbooks with the contents of standardized reading achievement tests may be a source of bias that is frequently overlooked in psychoeducational assessments. This study compares the standardized achievement test performance of 62 second graders receiving instruction in two different reading curricula (Open Court and Houghton-Mifflin) to determine whether either curriculum generates different quantitative estimates of reading achievement. Reading subtest scores derived from the Kaufman Test of Educational Achievement-Brief Form (K-TEA), the Wide Range Achievement Test-Revised (WRAT-R), and the Reading Recognition and Reading Comprehension subtests from the Peabody Individual Achievement Test (PIAT) were examined. Grade level equivalents and scaled scores from the California Achievement Test (CAT) were also examined. Three Curriculum × Test repeated measures ANOVAs were conducted using grade level scores (2×7), standard scores (2×4), and CAT scaled scores (2 × 5) as dependent measures. A significant Curriculum × Test interaction was identified, suggesting differences among tests in estimates of reading ability as a function of the reading program.  相似文献   

When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores.  相似文献   

This study examined the effects of prematurity on 11-year-olds' performance on 2 specific aspects of cognition—memory and processing speed, using a new computer-administered battery, the Cognitive Abilities Test (CAT: Detterman). Preterms performed more poorly than their full-term controls on all memory tasks; this relative deficit was associated with the presence and severity of neonatal Respiratory Distress Syndrome (RDS). Preterms were also slower on selected aspects of processing speed but not on motor speed. Memory and processing speed, taken together, accounted for much of the 10-point difference in WISC-R IQ between groups.  相似文献   

The Kaufman Assessment Battery for Children (K-ABC) was administered to 44 4th-, 5th-, and 6th-grade students. Six months later, all students received the California Achievement Test (CAT). Significant positive correlations were obtained between K-ABC variables and CAT scores. CAT subtest scores and total score correlated higher with the K-ABC ACHV scale than with the K-ABC SEQ, SIM, or MPC scales on 8 of the 12 comparisons. The results support the predictive utility of the K-ABC, and also provide support for the differential validity of the K-ABC achievement vs. mental processing scales.  相似文献   

开展计算机辅助普通话水平测试工作,是我国普通话水平测试的一座里程碑。文章从心理学角度(晕轮效应和刻板效应)对其进行评述,以期人们能从整体上了解计算机辅助普通话水平测试工作的基本情况,对今后的研究有所帮助。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号