期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Jason L. Meyers G. Edward Miller Walter D. Way 《教育实用测度》2013,26(1):38-60

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change, test level, test content, and item format. As a follow-up to the real data analysis, a simulation study was performed to assess the effect of item position change on equating. Results from this study indicate that item position change significantly affects change in RID. In addition, although the test construction procedures used in the investigated state seem to somewhat mitigate the impact of item position change, equating results might be impacted in testing programs where other test construction practices or equating methods are utilized. 相似文献

2.

Summarizing Item Difficulty Variation with Parcel Scores

Gregory Camilli Adam Prowker John A. Dossey Mary M. Lindquist Ting‐Wei Chiu Sadako Vargas Jimmy De La Torre 《Journal of Educational Measurement》2008,45(4):363-389

A new method for analyzing differential item functioning is proposed to investigate the relative strengths and weaknesses of multiple groups of examinees. Accordingly, the notion of a conditional measure of difference between two groups (Reference and Focal) is generalized to a conditional variance. The objective of this article is to present and illustrate a strategy for aggregating results across sets of similar items that exhibit item difficulty variation. Logically, this aggregation strategy is related to the idea of DIF amplification, but estimation is ultimately carried out in the framework of a confirmatory multidimensional Rasch model. Grade 4 data from the 2000 National Assessment of Educational Progress are used to illustrate the technique. 相似文献

3.

Use of Restricted Item Response Models for Examining Item Difficulty Ordering and Slope Uniformity

Suzanne Lane 《Journal of Educational Measurement》1991,28(4):295-309

This article demonstrates the utility of restricted item response models for examining item difficulty ordering and slope uniformity for an item set that reflects varying cognitive processes. Twelve sets of paired algebra word problems were developed to systematically reflect various types of cognitive processes required for successful performance. This resulted in a total of 24 items. They reflected distance-rate–time (DRT), interest, and area problems. Hypotheses concerning difficulty ordering and slope uniformity for the items were tested by constraining item difficulty and discrimination parameters in hierarchical item response models. The first set of model comparisons tested the equality of the discrimination and difficulty parameters for each set of paired items. The second set of model comparisons examined slope uniformity within the complex DRT problems. The third set of model comparisons examined whether the familiarity of the story context affected item difficulty for two types of complex DRT problems. The last set of model comparisons tested the hypothesized difficulty ordering of the items. 相似文献

4.

Item Difficulty of Four Verbal Item Types and an Index of Differential Item Functioning for Black and White Examinees 总被引：1，自引：0，他引：1

Roy Freedle Irene Kostin 《Journal of Educational Measurement》1990,27(4):329-343

In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored. 相似文献

5.

The Effect of the Most-Attractive-Distractor Location on Multiple-Choice Item Difficulty

Jinnie Shin Okan Bulut Mark J. Gierl 《Journal of Experimental Education》2020,88(4):643-659

Abstract

The arrangement of response options in multiple-choice (MC) items, especially the location of the most attractive distractor, is considered critical in constructing high-quality MC items. In the current study, a sample of 496 undergraduate students taking an educational assessment course was given three test forms consisting of the same items but the positions of the most attractive distractor varied across the forms. Using a multiple-indicators–multiple-causes (MIMIC) approach, the effects of the most attractive distractor's positions on item difficulty were investigated. The results indicated that the relative placement of the most attractive distractor and the distance between the most attractive distractor and the keyed option affected students’ response behaviors. Moreover, low-achieving students were more susceptible to response-position changes than high-achieving students. 相似文献

6.

Item Difficulty and Interviewer Knowledge Effects on the Accuracy and Consistency of Examinee Response Processes in Verbal Reports

Jacqueline P. Leighton 《教育实用测度》2013,26(2):136-157

The Standards for Educational and Psychological Testing indicate that multiple sources of validity evidence should be used to support the interpretation of test scores. In the past decade, examinee response processes, as a source of validity evidence, have received increased attention. However, there have been relatively few methodological studies of the accuracy and consistency of examinee response processes as measured by verbal reports in the context of educational measurement. The objective of the current study was to investigate the accuracy and consistency of examinee response processes—as measured by verbal reports—as a function of varying interviewer and item variables in a think aloud interview within an educational measurement context. Results indicate that the accuracy of responses may be undermined when students perceive the interviewer to be an expert in the domain. Further, the consistency of response processes may be undermined when items that are too easy or difficult are used to elicit reports. The implications of these results for conducting think-aloud studies are explored. 相似文献

7.

Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics 总被引：1，自引：0，他引：1

Janice Dowd Scheuneman Kalle Gerritz 《Journal of Educational Measurement》1990,27(2):109-131

Statistics used to detect differential item functioning can also reflect differential strengths and weaknesses in the performance characteristics of population subgroups. In turn, item features associated with the differential performance patterns are likely to reflect some facet of the item task and hence its difficulty, that might previously have been overlooked. In this study, several item features were identified and coded for a large number of reading comprehension items from the two admissions testing programs. Item features included subject matter content, various properties of item structure, cognitive demand indicators, and semantic content (propositional analysis). Differential item functioning was evaluated for males and females and for White and Black examinees. Results showed a number of significant relationships between item features and indicators of differential item functioning—many of which were consistent across testing programs. Implications of the results for related areas of research are discussed. 相似文献

8.

文言文句读试题难度分析及评分标准探究

赵宁宁牛变过《考试研究》2016,(4):56-64

文言文句读题在中学语文试卷中具有十分重要的作用和意义,它的答题状况可以反映学生文言文阅读能力的高低。本研究运用三道文言文的句读试题,对5629名三至八年级中小学生进行测试,通过锚题设计并运用项目反应理论对文言文的句读试题进行难度估算。研究发现,首先文言文句读试题难度差异较大,句读试题的难度受到词性和古今异义等要素影响而呈现不同的等级;其次通过对文言文目前两种评分标准进行对比发现,对高能力学生而言,正误评分的测算精读比正确评分要好,但在单篇文本的前提下,正确评分所估计出来的能力值与文言文句读能力的关系更加密切。相似文献

9.

Achievement as a Function of Test Item Complexity and Difficulty

Gilbert Sax Enid G. Eilenberg Alan J. Klockars 《Journal of Experimental Education》2013,81(4):90-93

The effects of training tests on subsequent achievement were studied using 2-test item characteristics: item difficulty and item complexity. Ninety Ss were randomly assigned to treatment conditions having easy or difficult items and calling for rote or complex skills. Each S was administered two training tests during the quarter containing only items defined by his treatment condition. The dependent measure was a sixty item final examination with fifteen items reflecting each of the four treatment condition item types. The results showed greater achievement for those trained with difficult items and with rote items. In addition, two interaction of treatment conditions with type of test items were found. The results are discussed as supporting a hierarchical model rather than a “similarity” transfer model of learning. 相似文献

10.

在EXCEL中应用Rasch模型计算题目难度

王生军《安徽广播电视大学学报》2004,(3):120-123

应用Rasch模型计算出来的题目难度值与被试样本无关,是题目的一项最重要的量化指标.Rasch模型的题目难度的计算在EXCEL程序中能很方便地完成,本文介绍了详细的计算步骤,并讨论了怎样用题目难度值来估算考生的能力水平. 相似文献

11.

Effect of Varying Item Order on Multiple-Choice Test Scores: Importance of Statistical and Cognitive Difficulty

《教育实用测度》2013,26(1):89-97

Research on the use of multiple-choice tests has presented conflicting evidence about the use of statistical item difficulty as a means of ordering items. An alternate method advocated by many texts is the use of cognitive difficulty. This study examined the effect of using both statistical and cognitive item difficulty in determining item order. Results indicated that those students who received items in an increasing cognitive order, no matter what the order of statistical difficulty, scored higher on hard items. Those students who received the forms with opposing cognitive and statistical difficulty orders scored the highest on medium-level items. The study concludes with a call for more research on the effects of cognitive difficulty and suggests that future studies examine subscores as well as total test results. 相似文献

12.

Differentials of a State Reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories

Kentaro Kato Ross E. Moen Martha L. Thurlow 《Educational Measurement》2009,28(2):28-40

Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional behavior disorders) and students without disabilities. Multinomial logistic regression was employed to compare response characteristic curves (RCCs) of individual test items. Although no evidence for serious test bias was found for the state assessment examined in this study, the results indicated that students in different disability categories showed different patterns of DIF, DDF, and DOF, and that the use of RCCs helps clarify the implications of DIF and DDF. 相似文献

13.

Comparing the Difficulty of Examination Subjects with Item Response Theory

Oksana B. Korobko Cees A. W. Glas Roel J. Bosker Johan W. Luyten 《Journal of Educational Measurement》2008,45(2):139-157

Methods are presented for comparing grades obtained in a situation where students can choose between different subjects. It must be expected that the comparison between the grades is complicated by the interaction between the students' pattern and level of proficiency on one hand, and the choice of the subjects on the other hand. Three methods based on item response theory (IRT) for the estimation of proficiency measures that are comparable over students and subjects are discussed: a method based on a model with a unidimensional representation of proficiency, a method based on a model with a multidimensional representation of proficiency, and a method based on a multidimensional representation of proficiency where the stochastic nature of the choice of examination subjects is explicitly modeled. The methods are compared using the data from the Central Examinations in Secondary Education in the Netherlands. The results show that the unidimensional IRT model produces unrealistic results, which do not appear when using the two multidimensional IRT models. Further, it is shown that both the multidimensional models produce acceptable model fit. However, the model that explicitly takes the choice process into account produces the best model fit. 相似文献

14.

论高等教育自学考试试题难度的控制 总被引：1，自引：0，他引：1

王晓华《中国考试》2008,(1):48-54

高等教育自学考试试题难度的控制是命题工作中的主要内容和核心问题之一。本文论述了自学考试试题难度控制的意义、要求、基本措施以及试题难度调控的基本方法。特别指出自学考试大纲中对试题难度层次分值比例规定的不尽合理之处.并提出了相应的调整建议。分析了不同命题质量控制措施对试题难度控制的不同作用。最后,结合实例论述了试题难度调控的基本技术方法。相似文献

15.

高考英语阅读理解题选文长度与试题难度的相关性分析——兼谈影响阅读理解题难度的因素

《考试研究》2016,(6)

统计2008年至2014年山东省高考英语阅读理解题选文的长度,并将其与试题的难度进行相关性分析,探讨山东省高考英语阅读理解题选文长度与试题难度的关系。研究发现,阅读理解题选文长度与试题难度未呈现显著相关。研究认为,高考英语阅读理解题的选文长度对试题难度无显著影响,命题者可以通过试题的设计来控制试题难度。这一发现对于高考英语阅读理解题的选材及难度控制具有一定参考价值。相似文献

16.

高考命题中试题难度预测方法探索 总被引：1，自引：0，他引：1

毛竞飞《教育科学》2008,24(6)

运用命题教师主观评估、多元线性回归分析和BP神经网络建模三种预测方法,对高考命题过程中试题的难度进行预测,并对三种方法的预测性能进行比较.结果发现,三种预测方法均具有较高的预测准确度,其中,BP神经网络预测模型对试题难度的预测准确度相对更高,误差相对更小. 相似文献

17.

Manipulating Processing Difficulty of Reading Comprehension Questions: The Feasibility of Verbal Item Generation

Joanna S. Gorin 《Journal of Educational Measurement》2005,42(4):351-373

Based on a previously validated cognitive processing model of reading comprehension, this study experimentally examines potential generative components of text-based multiple-choice reading comprehension test questions. Previous research ( Embretson & Wetzel, 1987 ; Gorin & Embretson, 2005 ; Sheehan & Ginther, 2001 ) shows text encoding and decision processes account for significant proportions of variance in item difficulties. In the current study, Linear Logistic Latent Trait Model (LLTM; Fischer, 1973 ) parameter estimates of experimentally manipulated items are examined to further verify the impact of encoding and decision processes on item difficulty. Results show that manipulation of some passage features, such as increased use of negative wording, significantly increases item difficulty in some cases, whereas others, such as altering the order of information presentation in a passage, did not significantly affect item difficulty, but did affect reaction time. These results suggest that reliable changes in difficulty and response time through algorithmic manipulation of certain task features is feasible. However, non-significant results for several manipulations highlight potential challenges to item generation in establishing direct links between theoretically relevant item features and individual item processing. Further examination of these relationships will be informative to item writers as well as test developers interested in the feasibility of item generation as an assessment tool. 相似文献

18.

Looking Beyond the Overall Scores of NAEP Assessments: Applications of Generalized Linear Mixed Modeling for Exploring Value-Added Item Difficulty Effects

Adam Prowker Gregory Camilli 《Journal of Educational Measurement》2007,44(1):69-87

The central idea of differential item functioning (DIF) is to examine differences between two groups at the item level while controlling for overall proficiency. This approach is useful for examining hypotheses at a finer-grain level than are permitted by a total test score. The methodology proposed in this paper is also aimed at estimating differences at the item rather than the overall score level, yet with the innovation where item-level differences for many groups simultaneously are the focus. This is a straightforward generalization of DIF as variance rather than one or several group differences; conceptually, this can be referred to as item difficulty variation (IDV). When instruction is of interest, and "groups" is a unit at which instruction is determined or delivered, then IDV signals value-added effects that can be influenced by either demographic or instructional variables. 相似文献

19.

Detecting Calculator Effects on Item Performance

《教育实用测度》2013,26(4):303-320

Calculator effects were examined using methods taken from research on differential item functioning. Use of a calculator was controlled on two experimental forms of a test assembled from operational items used on a standardized university mathematics placement test. Results indicated that calculator effects were not present based on analysis of test scores and in only two of the three subscores composed from homogeneous item types. Analyses of item-level functioning indicated, however, that a number of items, including several not included in the two significant subscore combinations, also contained calculator effects. For those items identified, use of the calculator appeared to have changed the actual objective being tested. The findings were generally consistent with previous research: Items that were easier when a calculator was used required either simple computations or use of a function key on the calculator; items that were more difficult required knowledge of a procedure either with or without additional computation. Analysis at the item level facilitated clearer understanding of the impact of calculator use on measurement of the underlying objective. 相似文献

20.

基于人工神经网络的C．TEST阅读理解题目难度的预测研究

付佩宣《暨南大学华文学院学报》2014,(4):71-78

实用汉语水平认定考试（简称C．TEST）是用来测试母语非汉语的外籍人士在国际环境下社会生活以及日常工作中实际运用汉语能力的考试。由于C．TEST的考试题目公开,题库数量较小,所以通过一般标准化考试采用的在部分目标被试中实施预测（fieldtest）的方法来获取考试题目的难度参数存在困难。然而,人工神经网络技术作为现代人工智能研究的成果,在预测（prediction）领域发挥了很大作用。本文选取C．TEST（A—D级）的阅读理解题目作为研究材料,运用人工神经网络技术对其难度进行预测,得到了网络预测难度值与实际考试难度值显著相关的研究结果。这一结果表明,利用人工神经网络模型对语言测验的题目难度等参数进行预测是可行的。相似文献