首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A simulation study was performed to determine whether a group's average percent correct in a content domain could be accurately estimated for groups taking a single test form and not the entire domain of items. Six Item Response Theory based domain score estimation methods were evaluated, under conditions of few items per content area perform taken, small domains, and small group sizes. The methods used item responses to a single form taken to estimate examinee or group ability; domain scores were then computed using the ability estimates and domain item characteristics. The IRT-based domain score estimates typically showed greater accuracy and greater consistency across forms taken than observed performance on the form taken. For the smallest group size and least number of items taken, the accuracy of most IRT-based estimates was questionable; however, a procedure that operates on an estimated distribution of group ability showed promise under most conditions.  相似文献   

2.
Science teachers’ content knowledge is an important influence on student learning, highlighting an ongoing need for programs, and assessments of those programs, designed to support teacher learning of science. Valid and reliable assessments of teacher science knowledge are needed for direct measurement of this crucial variable. This paper describes multiple sources of validity and reliability (Cronbach’s alpha greater than 0.8) evidence for physical, life, and earth/space science assessments—part of the Diagnostic Teacher Assessments of Mathematics and Science (DTAMS) project. Validity was strengthened by systematic synthesis of relevant documents, extensive use of external reviewers, and field tests with 900 teachers during assessment development process. Subsequent results from 4,400 teachers, analyzed with Rasch IRT modeling techniques, offer construct and concurrent validity evidence.  相似文献   

3.
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation methods explored in the current study include augmentation based on classical test theory and multidimensional item response theory (MIRT). The study shows that there is no estimation method that is optimal according to both criteria. Augmented subscores show the most improvement in reliability compared to observed subscores but are the least distinct.  相似文献   

4.
In classical test theory, a test is regarded as a sample of items from a domain defined by generating rules or by content, process, and format specifications, l f the items are a random sample of the domain, then the percent-correct score on the test estimates the domain score, that is, the expected percent correct for all items in the domain. When the domain is represented by a large set of calibrated items, as in item banking applications, item response theory (IRT) provides an alternative estimator of the domain score by transformation of the IRT scale score on the test. This estimator has the advantage of not requiring the test items to be a random sample of the domain, and of having a simple standard error. We present here resampling results in real data demonstrating for uni- and multidimensional models that the IRT estimator is also a more accurate predictor of the domain score than is the classical percent-correct score. These results have implications for reporting outcomes of educational qualification testing and assessment.  相似文献   

5.
6.
7.
吴婷 《海外英语》2012,(22):103-105
Listening testing is a universal social activity,especially for school life as well as an indispensable part to language assessment.How test takers perform during the tests may affect their entry to many significant roles both in society and schools.This paper is an attempt to explore how to design a reliable and valid listening test for particular purposes in EFL context.  相似文献   

8.
The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle; volume; minimum error variance of the linear combination; minimum error variance of the composite score with optimized weight; and Kullback‐Leibler information) were studied and compared with two methods for item exposure control (the Sympson‐Hetter procedure and the fixed‐rate procedure, the latter simply refers to putting a limit on the item exposure rate) using simulated data. The maximum priority index method was used for the content constraints. Results showed that the Sympson‐Hetter procedure yielded better precision than the fixed‐rate procedure but had much lower item pool usage and took more time. The five item selection procedures performed similarly under Sympson‐Hetter. For the fixed‐rate procedure, there was a trade‐off between the precision of the ability estimates and the item pool usage: the five procedures had different patterns. It was found that (1) Kullback‐Leibler had better precision but lower item pool usage; (2) minimum angle and volume had balanced precision and item pool usage; and (3) the two methods minimizing the error variance had the best item pool usage and comparable overall score recovery but less precision for certain domains. The priority index for content constraints and item exposure was implemented successfully.  相似文献   

9.
移动学习——国外研究现状之综述   总被引:33,自引:8,他引:33  
移动学习是继数字化学习后出现的又一新学习模式,是教育技术领域研究的又一个新热点。如何充分有效地使用无线技术和移动计算设备来辅助教学和学习成为移动学习研究的中心[1]。该文针对移动学习目前存在的几种不同定义,提出了关于如何正确理解移动学习的几点看法。同时结合大量国外移动学习的研究事实,展示移动学习目前的研究状况和研究成果。  相似文献   

10.
从评分、等值到成绩报告的过程中,各环节相互依赖和影响,其评价结果极易出现错误。为了监控这一评价过程并尽可能减少犯错数量,需要制定一套质量监控程序。所谓质量监控即指用来确保评分、等值和分数报告过程中达到预期质量标准的一个正规的系统化过程。评分-等值-分数报告过程可分为11个环节,在很多情况下,质量检查都可以在最终产品上进行。  相似文献   

11.
There is significant potential for error in long production processes that consist of sequential stages, each of which is heavily dependent on the previous stage, such as the SER (Scoring, Equating, and Reporting) process. Quality control procedures are required in order to monitor this process and to reduce the number of mistakes to a minimum. In the context of this module, quality control is a formal systematic process designed to ensure that expected quality standards are achieved during scoring, equating, and reporting of test scores. The module divides the SER process into 11 steps. For each step, possible mistakes that might occur are listed, followed by examples and quality control procedures for avoiding, detecting, or dealing with these mistakes. Most of the listed quality control procedures are also relevant for Internet-delivered and scored testing. Lessons from other industries are also discussed. The motto of this module is: There is a reason for every mistake. If you can identify the mistake, you can identify the reason it happened and prevent it from recurring.  相似文献   

12.
教育数据挖掘是一个新兴的、备受关注的研究领域。文章运用文献计量与内容分析法,对国内外公开发表的关于教育数据挖掘的文献进行统计分析,把握其发展脉络及研究现状,探讨研究中的关键内容,并展望该领域未来的研究趋势,为进行教育数据挖掘的研究与实践提供参考。  相似文献   

13.
当前对《红星报》的研究正处于起步阶段,其成果主要集中在"总体面貌与历史地位、对长征的宣传功绩、邓小平主编该报时的成就"等方面;其后续研究应从内部与外部两个视域展开系统性协调探究,以求为当今之新闻报纸与宣传工作贡献理论智慧。  相似文献   

14.
《教育实用测度》2013,26(3):231-244
For any testing program intended for licensure, certification, competency, or proficiency, the estimation of content relevant test scores for pass/fail decision making is necessary. This study compares number-correct scoring to empirical option weighting in the context of such tests. The study was conducted under two test design conditions, three test length conditions, and four passing score levels. Two criteria were used to evaluate the effectiveness of empirical option weighting versus number-correct scoring. Empirical option weighting typically produced slightly more reliable domain score estimates and more consistent pass/fail decisions than number-correct scoring, particularly in the lower half of the test score distribution. For many types of testing programs where the passing scores are established in the lower half of the test score distribution, the empirical option weighting method used in this study seems both appropriate and effective in improving the depend- ability of test scores and the consistency of pass/fail decisions. Test users, however, must weigh the effort required to use option weighting against the small gains obtained with this method. Other problems are discussed that may limit the usefulness of option weighting.  相似文献   

15.
高分低能指在领域测验高得分而领域问题解决能力低下的现象。高分低能说存在逻辑上的诸多荒谬。出现高分低能的学习结果,与教师学习分类思想的缺乏、教师教学方法的单一、教师教学缺乏课程标准规范、课堂教学的生态化缺失以及无视知识与能力获得机制的不同有关。克服上述问题,正是高分低能学习结果的应对之道。  相似文献   

16.
介绍了标准误差 ,误差传递 ,给出了有效数字的处理方法  相似文献   

17.
A comparison of animism in college males and females was made. The test instrument was the Crowell-Dole Information Scale, a self-report questionnaire of common objects. A total of 59. 8 percent of all Ss indicated animistic tendencies. Chi-square analysis of the raw data indicated no significant difference in incidents of animism for males and females. No significant difference was found between those students having one or more college biology courses and those with no formal training in biology.  相似文献   

18.
浅析行政奖励行为的有效性与可诉性   总被引:3,自引:0,他引:3  
行政奖励在现代行政管理工作中起着愈来愈重要的作用。结合法律原则和我国国情,行政奖励应具有法律约束力。法定行政奖励具有可诉性,其他行政奖励则不具有可诉性。  相似文献   

19.
20.
The central idea of differential item functioning (DIF) is to examine differences between two groups at the item level while controlling for overall proficiency. This approach is useful for examining hypotheses at a finer-grain level than are permitted by a total test score. The methodology proposed in this paper is also aimed at estimating differences at the item rather than the overall score level, yet with the innovation where item-level differences for many groups simultaneously are the focus. This is a straightforward generalization of DIF as variance rather than one or several group differences; conceptually, this can be referred to as item difficulty variation (IDV). When instruction is of interest, and "groups" is a unit at which instruction is determined or delivered, then IDV signals value-added effects that can be influenced by either demographic or instructional variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号