期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

黄友嫦《番禺职业技术学院学报》2004,3(2):54-58

根据语言测试理论，通过对《大学英语自学教程》(上册)的《同步辅导／同步训练》中部分多项选择题的试验，对影响语法／词汇多项选择题效度的干扰项设计进行探讨和分析。相似文献

2.

The Effect of the Most-Attractive-Distractor Location on Multiple-Choice Item Difficulty

Jinnie Shin Okan Bulut Mark J. Gierl 《Journal of Experimental Education》2020,88(4):643-659

Abstract

The arrangement of response options in multiple-choice (MC) items, especially the location of the most attractive distractor, is considered critical in constructing high-quality MC items. In the current study, a sample of 496 undergraduate students taking an educational assessment course was given three test forms consisting of the same items but the positions of the most attractive distractor varied across the forms. Using a multiple-indicators–multiple-causes (MIMIC) approach, the effects of the most attractive distractor's positions on item difficulty were investigated. The results indicated that the relative placement of the most attractive distractor and the distance between the most attractive distractor and the keyed option affected students’ response behaviors. Moreover, low-achieving students were more susceptible to response-position changes than high-achieving students. 相似文献

3.

The Point-Biserial as a Discrimination Index for Distractors in Multiple-Choice Items: Deficiencies in Usage and an Alternative

Yigal Attali Tamar Fraenkel 《Journal of Educational Measurement》2000,37(1):77-86

We show that using the point-biserial as a discrimination index for distractors by differentiating between examinees who chose the distractor and examinees who did not choose the distractor is theoretically wrong and may lead to an incorrect rejection of items. We propose an alternative usage and present empirical evidence for its suitability. 相似文献

4.

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important Than Their Quantity

Martin Papenberg Jochen Musch 《教育实用测度》2017,30(4):273-286

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors. Surprisingly, we found that items in which only the one best distractor was presented together with the solution provided the strongest criterion-related evidence of the validity of test scores and thus allowed for the most valid conclusions on the general knowledge level of test takers. Items that included the best distractor produced more reliable test scores irrespective of option number. Increasing the number of options increased item difficulty, but did not increase internal consistency when testing time was controlled for. 相似文献

5.

Calibration and Scoring of Tests With Multiple-Choice and Constructed-Response Item Types

Kadriye Ercikan Richard D. Sehwarz Marc W. Julian George R. Burket Melba M. Weber Valerie Link 《Journal of Educational Measurement》1998,35(2):137-154

This article discusses and demonstrates combining scores from multiple-choice (MC) and constructed-response (CR) items to create a common scale using item response theory methodology. Two specific issues addressed are (a) whether MC and CR items can be calibrated together and (b) whether simultaneous calibration of the two item types leads to loss of information. Procedures are discussed and empirical results are provided using a set of tests in the areas of reading, language, mathematics, and science in three grades. 相似文献

6.

A Nested Logit Approach for Investigating Distractors as Causes of Differential Item Functioning

Youngsuk Suh Daniel M. Bolt 《Journal of Educational Measurement》2011,48(2):188-205

In multiple‐choice items, differential item functioning (DIF) in the correct response may or may not be caused by differentially functioning distractors. Identifying distractors as causes of DIF can provide valuable information for potential item revision or the design of new test items. In this paper, we examine a two‐step approach based on application of a nested logit model for this purpose. The approach separates testing of differential distractor functioning (DDF) from DIF, thus allowing for clearer evaluations of where distractors may be responsible for DIF. The approach is contrasted against competing methods and evaluated in simulation and real data analyses. 相似文献

7.

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Wooyeol Lee 《教育实用测度》2017,30(2):129-146

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered the true parameter. The simulation results suggest that item parameters were not recovered well when IPD was ignored, especially if there was a larger number of IPD conditions. In addition, coverage was not accurate in all IPD conditions when IPD is ignored. Also, the results suggest that the accuracy of person scores (measured by bias) is potentially problematic when the larger number of IPD items is ignored. However, the overall accuracy (measured by RMSE) and coverage were unexpectedly acceptable in the presence of IPD as defined in this study. 相似文献

8.

Type K and Other Complex Multiple-Choice Items: An Analysis of Research and Item Properties

Mark A. Albanese 《Educational Measurement》1993,12(1):28-33

What is a complex multiple-choice test item? What is the evidence that such items should be avoided? 相似文献

9.

Effect of Varying Item Order on Multiple-Choice Test Scores: Importance of Statistical and Cognitive Difficulty

《教育实用测度》2013,26(1):89-97

Research on the use of multiple-choice tests has presented conflicting evidence about the use of statistical item difficulty as a means of ordering items. An alternate method advocated by many texts is the use of cognitive difficulty. This study examined the effect of using both statistical and cognitive item difficulty in determining item order. Results indicated that those students who received items in an increasing cognitive order, no matter what the order of statistical difficulty, scored higher on hard items. Those students who received the forms with opposing cognitive and statistical difficulty orders scored the highest on medium-level items. The study concludes with a call for more research on the effects of cognitive difficulty and suggests that future studies examine subscores as well as total test results. 相似文献

10.

Are Tests Comprising Both Multiple-Choice and Free-Response Items Necessarily Less Unidimensional Than Multiple-Choice Tests?An Analysis of Two Tests

David Thissen Howard Wainer Xiang-Bo Wang 《Journal of Educational Measurement》1994,31(2):113-123

We consider the relationship between the multiple-choice and free-response sections on the Computer Science and Chemistry tests of the College Board's Advanced Placement program. Restricted factor analysis shows that the free-response sections measure the same underlying proficiency as the multiple-choice sections for the most part. However, there is also a significant, if relatively small, amount of local dependence among the free-response items that produces a small degree of multidimensionauty for each test 相似文献

11.

Three Options Are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research

Michael C. Rodriguez 《Educational Measurement》2005,24(2):3-13

Multiple-choice items are a mainstay of achievement testing. The need to adequately cover the content domain to certify achievement proficiency by producing meaningful precise scores requires many high-quality items. More 3-option items can be administered than 4- or 5-option items per testing time while improving content coverage, without detrimental effects on psychometric quality of test scores. Researchers have endorsed 3-option items for over 80 years with empirical evidence—the results of which have been synthesized in an effort to unify this endorsement and encourage its adoption. 相似文献

12.

Assessing the Utility of Item Response Models: Computerized Adaptive Testing

G. Gage Kingsbury Ronald L. Houser 《Educational Measurement》1993,12(1):21-27

How has Item Response Theory helped solve problems in the development and use of computer-adaptive tests? Do we need to balance item content with computer-adaptive tests? Could we use IRT to evaluate unusual responses to computer-delivered tests? 相似文献

13.

Models of Decisionmaking Processes for Multiple-Choice Test Items: An Analysis of Spatial Ability

Rand R. Wilcox Karen Thompson Wilcox 《Journal of Educational Measurement》1988,25(2):125-136

Latent class models of decisionmaking processes related to multiple-choice test items are extremely important and useful in mental test theory. However, building realistic models or studying the robustness of existing models is very difficult. One problem is that there are a limited number of empirical studies that address this issue. The purpose of this paper is to describe and illustrate how latent class models, in conjunction with the answer-until-correct format, can be used to examine the strategies used by examinees for a specific type of task. In particular, suppose an examinee responds to a multiple-choice test item designed to measure spatial ability, and the examinee gets the item wrong. This paper empirically investigates various latent class models of the strategies that might be used to arrive at an incorrect response. The simplest model is a random guessing model, but the results reported here strongly suggest that this model is unsatisfactory. Models for the second attempt of an item, under an answer-until-correct scoring procedure, are proposed and found to give a good fit to data in most situations. Some results on strategies used to arrive at the first choice are also discussed 相似文献

14.

The Foundations of Multiple-Choice Items

李迎旭吴克勤周磊《时代教育》2012,(21):129

The foundations of multiple-choice items are examined from three aspects in this paper: psychology linguistics and pedagogy,structural linguistics and audiolingualism.With this careful examination,some nature and characteristics is expected to gain,which paves a smooth way for the actual evaluation of the multiple-choice technique in testing. 相似文献

15.

Assessment of Differential Item Functioning Under Cognitive Diagnosis Models: The DINA Model Example

下载免费PDF全文

Xiaomin Li Wen‐Chung Wang 《Journal of Educational Measurement》2015,52(1):28-54

The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are multidimensional and binary. This study proposes a very general DIF assessment method in the CDM framework which is applicable for various CDMs, more than two groups of examinees, and multiple grouping variables that are categorical, continuous, observed, or latent. The parameters can be estimated with Markov chain Monte Carlo algorithms implemented in the freeware WinBUGS. Simulation results demonstrated a good parameter recovery and advantages in DIF assessment for the new method over the Wald method. 相似文献

16.

Sociocognitive Processes and Item Response Models: A Didactic Example

Tao Gong Lan Shuai Robert J. Mislevy 《Journal of Educational Measurement》2024,61(1):150-173

相似文献

17.

Use of Restricted Item Response Theory Models for Examining the Stability of Item Parameter Estimates Over Time

《教育实用测度》2013,26(2):125-141

Item parameter instability can threaten the validity of inferences about changes in student achievement when using Item Response Theory- (IRT) based test scores obtained on different occasions. This article illustrates a model-testing approach for evaluating the stability of IRT item parameter estimates in a pretest-posttest design. Stability of item parameter estimates was assessed for a random sample of pretest and posttest responses to a 19-item math test. Using MULTILOG (Thissen, 1986), IRT models were estimated in which item parameter estimates were constrained to be equal across samples (reflecting stability) and item parameter estimates were free to vary across samples (reflecting instability). These competing models were then compared statistically in order to test the invariance assumption. The results indicated a moderately high degree of stability in the item parameter estimates for a group of children assessed on two different occasions. 相似文献

18.

Use of Restricted Item Response Models for Examining Item Difficulty Ordering and Slope Uniformity

Suzanne Lane 《Journal of Educational Measurement》1991,28(4):295-309

This article demonstrates the utility of restricted item response models for examining item difficulty ordering and slope uniformity for an item set that reflects varying cognitive processes. Twelve sets of paired algebra word problems were developed to systematically reflect various types of cognitive processes required for successful performance. This resulted in a total of 24 items. They reflected distance-rate–time (DRT), interest, and area problems. Hypotheses concerning difficulty ordering and slope uniformity for the items were tested by constraining item difficulty and discrimination parameters in hierarchical item response models. The first set of model comparisons tested the equality of the discrimination and difficulty parameters for each set of paired items. The second set of model comparisons examined slope uniformity within the complex DRT problems. The third set of model comparisons examined whether the familiarity of the story context affected item difficulty for two types of complex DRT problems. The last set of model comparisons tested the hypothesized difficulty ordering of the items. 相似文献

19.

Subjective Priors for Item Response Models: Application of Elicitation by Design

下载免费PDF全文

Allison Ames Elizabeth Smith 《Journal of Educational Measurement》2018,55(3):373-402

Bayesian methods incorporate model parameter information prior to data collection. Eliciting information from content experts is an option, but has seen little implementation in Bayesian item response theory (IRT) modeling. This study aims to use ethical reasoning content experts to elicit prior information and incorporate this information into Markov Chain Monte Carlo (MCMC) estimation. A six‐step elicitation approach is followed, with relevant details at each stage for two IRT items parameters: difficulty and guessing. Results indicate that using content experts is the preferred approach, rather than noninformative priors, for both parameter types. The use of a noninformative prior for small samples provided dramatically different results when compared to results from content expert–elicited priors. The WAMBS (When to worry and how to Avoid the Misuse of Bayesian Statistics) checklist is used to aid in comparisons. 相似文献

20.

Comparing the Fit of Item Response Theory and Factor Analysis Models

Alberto Maydeu-Olivares Li Cai Adolfo Hernández 《Structural equation modeling》2013,20(3):333-356

Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model. 相似文献