期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using SAS to Assess Differential Item Performance

James J. Diamond 《Educational Measurement》1989,8(4):27-28

相似文献

2.

考查科学思维的理科考试命题策略探讨

《中国考试》2016,(10)

培养学生的科学思维是科学教育的重要目标,运用科学思维的方式思考和解决问题也是对具有科学素养的人的基本要求。在理科的纸笔评价中,可以利用科技论文、科学史和故事线索等策略编制题目,考查学生的科学思维。相似文献

3.

A Comparison of the Kernel Equating Method with Traditional Equating Methods Using SAT® Data

Jinghua Liu Albert C. Low 《Journal of Educational Measurement》2008,45(4):309-323

This study applied kernel equating (KE) in two scenarios: equating to a very similar population and equating to a very different population, referred to as a distant population, using SAT^® data. The KE results were compared to the results obtained from analogous traditional equating methods in both scenarios. The results indicate that KE results are comparable to the results of other methods. Further, the results show that when the two populations taking the two tests are similar on the anchor score distributions, different equating methods yield the same or very similar results, even though they have different assumptions. 相似文献

4.

锚测验难度参数方差特征对测验等值的影响研究

曹文娟白俊梅《考试研究》2013,(3):79-85,33

本文使用R-2.15.2软件模拟研究锚测验难度参数方差特征对测验等值误差的影响,采用三种等值方法(链百分位等值法、Levine等值法和Tucker等值法)对锚测验不同类型的难度方差进行比较研究。结果显示,当锚测验难度方差小于全测验难度方差时,其等值的随机误差和系统误差与锚测验难度方差和全测验难度方差一致时(即锚测验为全测验的平行缩减版minitest时)的表现基本相同。因此,对锚测验而言,要求其与全测验具有相同的统计规格可能过于严格。相似文献

5.

评分、等值和分数报告过程中的质量监控

Avi Allalouf 《湖北招生考试》2008,(16)

从评分、等值到成绩报告的过程中,各环节相互依赖和影响,其评价结果极易出现错误。为了监控这一评价过程并尽可能减少犯错数量,需要制定一套质量监控程序。所谓质量监控即指用来确保评分、等值和分数报告过程中达到预期质量标准的一个正规的系统化过程。评分-等值-分数报告过程可分为11个环节,在很多情况下,质量检查都可以在最终产品上进行。相似文献

6.

Comparison of Item Preequating and Random Groups Equating Using IRT and Equipercentile Methods

Michael J. Kolen Deborah J. Harris 《Journal of Educational Measurement》1990,27(1):27-39

An item-preequating design and a random groups design were used to equate forms of the American College Testing (ACT) Assessment Mathematics Test. Equipercentile and 3-parameter logistic model item-response theory (IRT) procedures were used for both designs. Both pretest methods produced inadequate equating results, and the IRT item preequating method resulted in more equating error than had no equating been conducted. Although neither of the item preequating methods performed well, the results from the equipercentile preequating method were more consistent with those from the random groups method than were the results from the IRT item pretest method. Item context and position effects were likely responsible, at least in part, for the inadequate results for item preequating. Such effects need to be either controlled or modeled, and the design further researched before the item preequating design can be recommended for operational use. 相似文献

7.

Quality Control Procedures in the Scoring, Equating, and Reporting of Test Scores

Avi Allalouf 《Educational Measurement》2007,26(1):36-46

There is significant potential for error in long production processes that consist of sequential stages, each of which is heavily dependent on the previous stage, such as the SER (Scoring, Equating, and Reporting) process. Quality control procedures are required in order to monitor this process and to reduce the number of mistakes to a minimum. In the context of this module, quality control is a formal systematic process designed to ensure that expected quality standards are achieved during scoring, equating, and reporting of test scores. The module divides the SER process into 11 steps. For each step, possible mistakes that might occur are listed, followed by examples and quality control procedures for avoiding, detecting, or dealing with these mistakes. Most of the listed quality control procedures are also relevant for Internet-delivered and scored testing. Lessons from other industries are also discussed. The motto of this module is: There is a reason for every mistake. If you can identify the mistake, you can identify the reason it happened and prevent it from recurring. 相似文献

8.

大学生综合测评方法中智育浮动分的影响

郑庭海郭中华《高教论坛》2014,(10):82-84

应用主成分评价方法对大学生综合测评成绩进行统计分析,以某学院综合测评办法得出数据为样本,检测大学生综合测评中智育浮动分对测评结果的影响度,发现综合测评办法中智育浮动分对结果影响显著,为修订、完善该大学生综合测评办法提供了依据和参考。相似文献

9.

The Effect of Item Response Changes on Scores on an Elementary Reading Achievement Test

《The Journal of educational research》2012,105(3):153-156

Abstract

The effect of changing item responses on scores of elementary school children on a standardized achievement test was studied. Previous research, primarily involving non-standardized instruments and adult samples, indicates that changed responses are more likely to be correct than not. Subjects were 165 third grade students using the Metropolitan Reading Tests. Students received no special instructions regarding changing responses. Changes were identified visually and were independently verified. While frequency of response changes was low, such changes generally improved scores. Sex differences in number and success of changes were non-significant. The relationship between frequency of response change and test score was minimal. Responses to difficult items were changed more frequently with less success than changes on easy items. High scorers made more successful changes than did low scorers. Within the limits of the methodology, results clearly indicated that response changes of elementary students on multiple-choice items tend to improve test scores. 相似文献

10.

应用项目反应理论等值含有多种题型考试的一个实例 总被引：2，自引：2，他引：2

HAN Ning 《中国考试》2008,(7)

本文以美国一个州的高中统考为例介绍应用项目反应理论来对含有多种题型的考试进行等值处理的具体做法,同时也对考试的其他技术环节进行了一些探讨。相似文献

11.

Small-Sample Equating Using a Single-Group Nearly Equivalent Test (SiGNET) Design

Gautam Puhan Timothy P. Moses Mary C. Grant Frederick McHale 《Journal of Educational Measurement》2009,46(3):344-362

A single-group (SG) equating with nearly equivalent test forms (SiGNET) design was developed by Grant to equate small-volume tests. Under this design, the scored items for the operational form are divided into testlets or mini tests. An additional testlet is created but not scored for the first form. If the scored testlets are testlets 1–6 and the unscored testlet is testlet 7, then the first form is composed of testlets 1–6 and the second form is composed of testlets 2–7. The seven testlets are administered as a single administered form, and when a sufficient number of examinees have taken the administered form, the second form (testlets 2–7) is equated to the first form (testlets 1–6) using an SG equating design. As evident, this design facilitates the use of an SG equating and allows for the accumulation of data, both of which may reduce equating error. This study compared equatings under the SiGNET and common-item equating designs and found lower equating error for the SiGNET design in very small sample size conditions (e.g., N = 10). 相似文献

12.

Effects of Item Order and Context on Estimation of NAEP Reading Proficiency

Rebecca Zwick 《Educational Measurement》1991,10(3):10-16

What can be learned about the effects of item order and context on invariance of item parameter estimates? Are common-item equating methods appropriate when measuring trends in educational growth? 相似文献

13.

第二语言学习者专业背景对HSK阅读成绩影响的项目功能差异检验 总被引：1，自引：0，他引：1

黄春霞《考试研究》2011,(5):59-66

本文旨在考察HSK应试者的专业背景是否会对他们的阅读成绩产生影响。运用MH方法和SIBTEST方法对2009年HSK(初中等)考试阅读题目进行DIF筛查,把专业背景为自然科学的HSK考生设为目标组,专业背景为人文社会科学的HSK考生设为参照组。MH方法的结果是没有找到含有DIF的题目;SIBTEST方法的结果如下:第一轮DIF筛查检测到一个题目,第二轮DBF筛查检测到一组题目。这组题目有利于人文社会学科专业背景的被试。就检测DIF的方法而言,本研究认为SIBTEST方法更加敏感,DBF检验更加适合像阅读理解测验这样的一组或多组相互关联的题目。相似文献

14.

Using Participation to Assess Students' Knowledge

《College Teaching》2013,61(4)

相似文献

15.

The Development and Evaluation of Procedures to Assess Child Self-report Item Validity

Woolley ME Bowen GL Bowen NK 《Educational and psychological measurement》2006,66(4):687-700

Cognitive pretesting (CP) is an interview methodology for pretesting the validity of items during the development of self-report instruments. The present research evaluates a systematic approach to the analysis of CP data. Materials and procedures were developed to rate self-report item performance with CP interview text data. Five raters were trained in the application of that system. Estimates of inter-rater reliability found acceptable to substantial levels of inter-rater agreement. Results from the present study suggest that excellent inter-rater reliability can be achieved in the evaluation of CP data. Guidelines for systematically rating the qualitative data collected using CP methods are provided. Future research should focus on empirical demonstrations of how such rating procedures can lead to improvements in self-report instruments. 相似文献

16.

Using Standard-Setting Data to Establish Cutoff Scores

Kurt F. Geisinger 《Educational Measurement》1991,10(2):17-22

Is it enough to run the standard-setting panel and to stop? Once a panel has met or a contrasting group experiment has been conducted, do we have a cutoff score? On what basis might a proposed cutoff score be adjusted? Who should make the adjustment decision? 相似文献

17.

工程项目施工招标的常用评标定标方法研究

刘红敏《中国教育技术装备》2010,(21):68-71

介绍国内招投标发展的历史,提出确定评标定标方法应遵循的原则,即科学性、公平性、合法性的原则。详细介绍目前工程项目施工招标常用的五大类评标定标方法：综合评分法、经评审的最低价法、平均值法、A＋B值法、投票法。对这五大类方法进行较深入的研究,详细分析其优缺点、适用范围,并有针对性地提出相应的完善对策。相似文献

18.

Assessing Fit of Item Response Models Using the Information Matrix Test

Jochen Ranger Jörg‐Tobias Kuhn 《Journal of Educational Measurement》2012,49(3):247-268

The information matrix can equivalently be determined via the expectation of the Hessian matrix or the expectation of the outer product of the score vector. The identity of these two matrices, however, is only valid in case of a correctly specified model. Therefore, differences between the two versions of the observed information matrix indicate model misfit. The equality of both matrices can be tested with the so‐called information matrix test as a general test of misspecification. This test can be adapted to item response models in order to evaluate the fit of single items and the fit of the whole scale. The performance of different versions of the test is compared in a simulation study with existing tests of model fit, among them the test of Orlando and Thissen, the score test of local independence due to Glas and Suarez‐Falcon, and the limited information approach of Maydeu‐Olivares and Joe. In general, the different versions of the information matrix test adhere to the nominal Type I error rate and have high power for detecting misspecified item characteristic curves. Additionally, some versions of the test can be used in order to detect violations of the local independence assumption. 相似文献

19.

Variability in Reading Scores on a Given Level of Intelligence Test Scores

《The Journal of educational research》2012,105(6):440-446

ABSTRACT

Previous studies have shown that several key variables influence student achievement in geometry, but no research has been conducted to determine how these variables interact. A model of achievement in geometry was tested on a sample of 102 high school students. Structural equation modeling was used to test hypothesized relationships among variables linked to successful problem solving in geometry. These variables, including motivation, achievement emotions, pictorial representation, and categorization skills, were examined for their influence on geometry achievement. Results indicated that the model fit well. Achievement emotions, specifically boredom and enjoyment, had a significant influence on student motivation. Student motivation influenced students’ use of pictorial representations and achievement. Pictorial representation also directly influenced achievement. Categorization skills had a significant influence on pictorial representations and student achievement. The implications of these findings for geometry instruction and for future research are discussed. 相似文献

20.

Regression to the Mean in Average Test Scores

《Educational Assessment》2013,18(4):377-399

A group's average test score is often used to evaluate different educational approaches, curricula, teachers, and schools. Studies of group test scores over time often try to measure "value-added" by holding constant certain student characteristics such as race, parents' education, or socioeconomic status; however, the important statistical phenomenon of regression to the mean is often ignored. There is a substantial literature on the importance of regression to the mean in a variety of contexts, including individual test scores. Here, we look at regression to the mean in group averages. If this regression is not taken into account, changes in a group's average test score over time may be misinterpreted as changes in the group's average ability rather than natural and expected fluctuations in scores about ability. California Academic Performance Index scores are used to illustrate this argument. 相似文献