期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Equivalence of Free-Response and Multiple-Choice Items

Randy Elliot Bennett Donald A. Rock Minhwei Wang 《Journal of Educational Measurement》1991,28(1):77-92

This study examined the relationship of multiple-choice and free-response items contained on the College Board's Advanced Placement Computer Science (APCS) examination. Confirmatory factor analysis was used to test the fit of a two-factor model where each item format marked its own factor. Results showed a single-factor solution to provide the most parsimonious fit in each of two random-half samples. This finding might be accounted for by several mechanisms, including overlap in the specific processes assessed by the multiple-choice and free-response items and the limited opportunity for skill differentiation afforded by the year-long APCS course. 相似文献

2.

True-False, Alternate-Choice, and Multiple-Choice Items

Steven M. Downing 《Educational Measurement》1992,11(3):27-30

What does research tell us about true-false tests? Is there a place for this format in standardized tests? 相似文献

3.

An Investigation of Explanation Multiple-Choice Items in Science Assessment

Ou Lydia Liu Hee-Sun Lee Marcia C. Linn 《Educational Assessment》2013,18(3):164-184

Both multiple-choice and constructed-response items have known advantages and disadvantages in measuring scientific inquiry. In this article we explore the function of explanation multiple-choice (EMC) items and examine how EMC items differ from traditional multiple-choice and constructed-response items in measuring scientific reasoning. A group of 794 middle school students was randomly assigned to answer either constructed-response or EMC items following regular multiple-choice items. By applying a Rasch partial-credit analysis, we found that there is a consistent alignment between the EMC and multiple-choice items. Also, the EMC items are easier than the constructed-response items but are harder than most of the multiple-choice items. We discuss the potential value of the EMC items as a learning and diagnostic tool. 相似文献

4.

Book Review of Developing and Validating Multiple-Choice Test Items

《Structural equation modeling》2013,20(2):313-315

相似文献

5.

A Comparison of Multiple-Choice and Constructed Figural Response Items

Michael E. Martinez 《Journal of Educational Measurement》1991,28(2):131-145

In contrast to multiple-choice test questions, figural response items call for constructed responses and rely upon figural material, such as illustrations and graphs, as the response medium. Figural response questions in various science domains were created and administered to a sample of 4th-, 8th-, and 12th-grade students. Item and test statistics from parallel sets of figural response and multiple-choice questions were compared. Figural response items were generally more difficult, especially for questions that were difficult (p < .5) in their constructed-response forms. Figural response questions were also slightly more discriminating and reliable than their multiple-choice counterparts, but they had higher omit rates. This article addresses the relevance of guessing to figural response items and the diagnostic value of the item type. Plans for future research on figural response items are discussed. 相似文献

6.

A Note on Decisionmaking Processes for Multiple-Choice Test Items

Rand R. Wilcox Karen Thompson Wilcox Jacob Chung 《Journal of Educational Measurement》1988,25(3):247-250

相似文献

7.

Answer Changing on Multiple-Choice Test Items Among Eighth-Grade Readers 总被引：1，自引：1，他引：0

Clifton A. Casteel 《Journal of Experimental Education》2013,81(4):300-309

This study was done to examine the effect of answer changing on multiple-choice test performance among good and poor readers in the eighth grade. Although the gains of poor readers were higher than those of good readers, all subjects profited significantly from changing their answers on items. For all subjects, when a single response was changed, there was a two-to-one chance that the new response would raise rather than lower the final score. Gains from answer changing on test items were slightly higher for poor readers as a group than were those for good readers. However, the result was determined not to be significant. More important, this hypothesis is strengthened by the fact that all subjects profited from answer changing. Therefore, the results were interpreted as lending support to the notion that answer-changing response among young examinees should be encouraged if there is a reasonable doubt about their “first impression.” 相似文献

8.

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Jee-Seon Kim 《Journal of Educational Measurement》2006,43(3):193-213

Simulation and real data studies are used to investigate the value of modeling multiple-choice distractors on item response theory linking. Using the characteristic curve linking procedure for Bock's (1972) nominal response model presented by Kim and Hanson (2002) , all-category linking (i.e., a linking based on all category characteristic curves of the linking items) is compared against correct-only (CO) linking (i.e., linking based on the correct category characteristic curves only) using a common-item nonequivalent groups design. The CO linking is shown to represent an approximation to what occurs when using a traditional correct/incorrect item response model for linking. Results suggest that the number of linking items needed to achieve an equivalent level of linking precision declines substantially when incorporating the distractor categories. 相似文献

9.

Gender Differences in Performance on Multiple-Choice and Constructed Response Mathematics Items

《教育实用测度》2013,26(1):29-51

相似文献

10.

Are Tests Comprising Both Multiple-Choice and Free-Response Items Necessarily Less Unidimensional Than Multiple-Choice Tests?An Analysis of Two Tests

David Thissen Howard Wainer Xiang-Bo Wang 《Journal of Educational Measurement》1994,31(2):113-123

We consider the relationship between the multiple-choice and free-response sections on the Computer Science and Chemistry tests of the College Board's Advanced Placement program. Restricted factor analysis shows that the free-response sections measure the same underlying proficiency as the multiple-choice sections for the most part. However, there is also a significant, if relatively small, amount of local dependence among the free-response items that produces a small degree of multidimensionauty for each test 相似文献

11.

Validating Measurement of Knowledge Integration in Science Using Multiple-Choice and Explanation Items

Hee-Sun Lee Ou Lydia Liu Marcia C. Linn 《教育实用测度》2013,26(2):115-136

This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item properties such as alignment, discrimination, and target range on the knowledge integration scale using a Rasch Partial Credit Model analysis. For instructional validity, we test the sensitivity of multiple-choice and explanation items to knowledge integration instruction using a cohort comparison design. Results show that (1) one third of correct multiple-choice responses are aligned with higher levels of knowledge integration while three quarters of incorrect multiple-choice responses are aligned with lower levels of knowledge integration, (2) explanation items discriminate between high and low knowledge integration ability students much more effectively than multiple-choice items, (3) explanation items measure a wider range of knowledge integration levels than multiple-choice items, and (4) explanation items are more sensitive to knowledge integration instruction than multiple-choice items. 相似文献

12.

Cognitive Processing requirements of Constructed Figural Response and Multiple-Choice Items in Architecture Assessment

《Educational Assessment》2013,18(1):83-98

Contrasts between constructed-response items and multiple-choice counterparts have yielded but a few weak generalizations. Such contrasts typically have been based on the statistical properties of groups of items, an approach that masks differences in properties at the item level and may lead to inaccurate conclusions. In this article, we examine item-level differences between a certain type of constructed-response item (called figural response) and comparable multiple-choice items in the domain of architecture. Our data show that in comparing two item formats, item-level differences in difficulty correspond to differences in cognitive processing requirements and that relations between processing requirements and psychometric properties are systematic. These findings illuminate one aspect of construct validity that is frequently neglected in comparing item types, namely the cognitive demand of test items. 相似文献

13.

Guess Where: The Position of Correct Answers in Multiple-Choice Test Items as a Psychometric Variable

Yigal Attali Maya Bar-Hillel 《Journal of Educational Measurement》2003,40(2):109-128

In this article, the authors show that test makers and test takers have a strong and systematic tendency for hiding correct answers—or, respectively, for seeking them—in middle positions. In single, isolated questions, both prefer middle positions to extreme ones in a ratio of up to 3 or 4 to 1. Because test makers routinely, deliberately, and excessively balance the answer key of operational tests, middle bias almost, though not quite, disappears in those keys. Examinees taking real tests also produce answer sequences that are more balanced than their single question tendencies but less balanced than the correct key. In a typical four-choice test, about 55% of erroneous answers are in the two central positions. The authors show that this bias is large enough to have real psychometric consequences, as questions with middle correct answers are easier and less discriminating than questions with extreme correct answers, a fact of which some implications are explored. 相似文献

14.

Construct Equivalence of Multiple-Choice and Constructed-Response Items: A Random Effects Synthesis of Correlations

Michael C. Rodriguez 《Journal of Educational Measurement》2003,40(2):163-184

A thorough search of the literature was conducted to locate empirical studies investigating the trait or construct equivalence of multiple-choice (MC) and constructed-response (CR) items. Of the 67 studies identified, 29 studies included 56 correlations between items in both formats. These 56 correlations were corrected for attenuation and synthesized to establish evidence for a common estimate of correlation (true-score correlations). The 56 disattenuated correlations were highly heterogeneous. A search for moderators to explain this variation uncovered the role of the design characteristics of test items used in the studies. When items are constructed in both formats using the same stem (stem equivalent), the mean correlation between the two formats approaches unity and is significantly higher than when using non-stem-equivalent items (particularly when using essay-type items). Construct equivalence, in part, appears to be a function of the item design method or the item writer's intent. 相似文献

15.

Models of Decisionmaking Processes for Multiple-Choice Test Items: An Analysis of Spatial Ability

Rand R. Wilcox Karen Thompson Wilcox 《Journal of Educational Measurement》1988,25(2):125-136

Latent class models of decisionmaking processes related to multiple-choice test items are extremely important and useful in mental test theory. However, building realistic models or studying the robustness of existing models is very difficult. One problem is that there are a limited number of empirical studies that address this issue. The purpose of this paper is to describe and illustrate how latent class models, in conjunction with the answer-until-correct format, can be used to examine the strategies used by examinees for a specific type of task. In particular, suppose an examinee responds to a multiple-choice test item designed to measure spatial ability, and the examinee gets the item wrong. This paper empirically investigates various latent class models of the strategies that might be used to arrive at an incorrect response. The simplest model is a random guessing model, but the results reported here strongly suggest that this model is unsatisfactory. Models for the second attempt of an item, under an answer-until-correct scoring procedure, are proposed and found to give a good fit to data in most situations. Some results on strategies used to arrive at the first choice are also discussed 相似文献

16.

Modified Multiple-Choice Items for Alternate Assessments: Reliability,Difficulty, and Differential Boost

Ryan J. Kettler Michael C. Rodriguez Daniel M. Bolt Stephen N. Elliott Peter A. Beddow Alexander Kurz 《教育实用测度》2013,26(3):210-234

Federal policy on alternate assessment based on modified academic achievement standards (AA-MAS) inspired this research. Specifically, an experimental study was conducted to determine whether tests composed of modified items would have the same level of reliability as tests composed of original items, and whether these modified items helped reduce the performance gap between AA-MAS eligible and ineligible students. Three groups of eighth-grade students (N?=?755) defined by eligibility and disability status took original and modified versions of reading and mathematics tests. In a third condition, the students were provided limited reading support along with the modified items. Changes in reliability across groups and conditions for both the reading and mathematics tests were determined to be minimal. Mean item difficulties within the Rasch model were shown to decrease more for students who would be eligible for the AA-MAS than for non-eligible groups, revealing evidence of differential boost. Exploratory analyses indicated that shortening the question stem may be a highly effective modification, and that adding graphics to reading items may be a poor modification. 相似文献

17.

Three Options Are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research

Michael C. Rodriguez 《Educational Measurement》2005,24(2):3-13

Multiple-choice items are a mainstay of achievement testing. The need to adequately cover the content domain to certify achievement proficiency by producing meaningful precise scores requires many high-quality items. More 3-option items can be administered than 4- or 5-option items per testing time while improving content coverage, without detrimental effects on psychometric quality of test scores. Researchers have endorsed 3-option items for over 80 years with empirical evidence—the results of which have been synthesized in an effort to unify this endorsement and encourage its adoption. 相似文献

18.

Type K and Other Complex Multiple-Choice Items: An Analysis of Research and Item Properties

Mark A. Albanese 《Educational Measurement》1993,12(1):28-33

What is a complex multiple-choice test item? What is the evidence that such items should be avoided? 相似文献

19.

The Point-Biserial as a Discrimination Index for Distractors in Multiple-Choice Items: Deficiencies in Usage and an Alternative

Yigal Attali Tamar Fraenkel 《Journal of Educational Measurement》2000,37(1):77-86

We show that using the point-biserial as a discrimination index for distractors by differentiating between examinees who chose the distractor and examinees who did not choose the distractor is theoretically wrong and may lead to an incorrect rejection of items. We propose an alternative usage and present empirical evidence for its suitability. 相似文献

20.

An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items

Youngsuk Suh Anna E. Talley 《教育实用测度》2015,28(1):48-67

This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods. Data from a college-level mathematics placement test were analyzed to understand the causes of differential functioning. Results indicated some agreement among the four detection methods. To facilitate practical interpretation of the DDF results, several possible effect size measures were also obtained and compared. 相似文献