首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This research examined component processes that contribute to performance on one of the new, standards-based reading tests that have become a staple in many states. Participants were 60 Grade 4 students randomly sampled from 7 classrooms in a rural school district. The particular test we studied employed a mixture of traditional (multiple-choice) and performance assessment approaches (constructed-response items that required written responses). Our findings indicated that multiple-choice and constructed-response items enlisted different cognitive skills. Writing ability emerged as an important source of individual differences in explaining overall reading ability, but its influence was limited to performance on constructed-response items. After controlling for word identification and listening, writing ability accounted for no variance in multiple-choice reading scores. By contrast, writing ability accounted for unique variance in reading ability, even after controlling for word identification and listening skill, and explained more variance in constructed-response reading scores than did either word identification or listening skill. In addition, performance on the multiple-choice reading measure along with writing ability accounted for nearly all of the reliable variance in performance on the constructed-response reading measure.  相似文献   

2.
Both multiple-choice and constructed-response items have known advantages and disadvantages in measuring scientific inquiry. In this article we explore the function of explanation multiple-choice (EMC) items and examine how EMC items differ from traditional multiple-choice and constructed-response items in measuring scientific reasoning. A group of 794 middle school students was randomly assigned to answer either constructed-response or EMC items following regular multiple-choice items. By applying a Rasch partial-credit analysis, we found that there is a consistent alignment between the EMC and multiple-choice items. Also, the EMC items are easier than the constructed-response items but are harder than most of the multiple-choice items. We discuss the potential value of the EMC items as a learning and diagnostic tool.  相似文献   

3.
The first generation of computer-based tests depends largely on multiple-choice items and constructed-response questions that can be scored through literal matches with a key. This study evaluated scoring accuracy and item functioning for an open-ended response type where correct answers, posed as mathematical expressions, can take many different surface forms. Items were administered to 1,864 participants in field trials of a new admissions test for quantitatively oriented graduate programs. Results showed automatic scoring to approximate the accuracy of multiple-choice scanning, with all processing errors stemming from examinees improperly entering responses. In addition, the items functioned similarly in difficulty, item-total relations, and male-female performance differences to other response types being considered for the measure.  相似文献   

4.
This was a study of differential item functioning (DIF) for grades 4, 7, and 10 reading and mathematics items from state criterion-referenced tests. The tests were composed of multiple-choice and constructed-response items. Gender DIF was investigated using POLYSIBTEST and a Rasch procedure. The Rasch procedure flagged more items for DIF than did the simultaneous item bias procedure—particularly multiple-choice items. For both reading and mathematics tests, multiple-choice items generally favored males while constructed-response items generally favored females. Content analyses showed that flagged reading items typically measured text interpretations or implied meanings; males tended to benefit from items that asked them to identify reasonable interpretations and analyses of informational text. Most items that favored females asked students to make their own interpretations and analyses, of both literary and informational text, supported by text-based evidence. Content analysis of mathematics items showed that items favoring males measured geometry, probability, and algebra. Mathematics items favoring females measured statistical interpretations, multistep problem solving, and mathematical reasoning.  相似文献   

5.
This article presents a study of ethnic Differential Item Functioning (DIF) for 4th-, 7th-, and 10th-grade reading items on a state criterion-referenced achievement test. The tests, administered 1997 to 2001, were composed of multiple-choice and constructed-response items. Item performance by focal groups (i.e., students from Asian/Pacific Island, Black/African American, Native American, and Latino/Hispanic origins) were compared with the performance of White students using simultaneous item bias and Rasch procedures. Flagged multiple-choice items generally favored White students, whereas flagged constructed-response items generally favored students from Asian/Pacific Islander, Black/African American, and Latino/Hispanic origins. Content analysis of flagged reading items showed that positively and negatively flagged items typically measured inference, interpretation, or analysis of text in multiple-choice and constructed-response formats. Items that were not flagged for DIF generally measured very easy reading skills (e.g., literal comprehension) and reading skills that require higher level thinking (e.g., developing interpretations across texts and analyzing graphic elements).  相似文献   

6.
Many efforts have been made to determine and explain differential gender performance on large-scale mathematics assessments. A well-agreed-on conclusion is that gender differences are contextualized and vary across math domains. This study investigated the pattern of gender differences by item domain (e.g., Space and Shape, Quantity) and item type (e.g., multiple-choice i iIn this paper, two kinds of multiple-choice items are discussed: traditional multiple-choice items and complex multiple-choice items. A sample complex multiple choice item is shown in Table 6. The terms “multiple-choice” and “traditional multiple-choice” are used interchangeably to refer to the traditional multiple choice items throughout the paper, while the term “complex multiple-choice” is used to refer to the complex multiple-choice items. Raman K. Grover is now an Independent Psychometrician. items, open constructed-response items). The U.S. portion of the Programme for International Student Assessment (PISA) 2000 and 2003 mathematics assessment was analyzed. A multidimensional Rasch model was used to provide student ability estimates for each comparison. Results revealed a slight but consistent male advantage. Students showed the largest gender difference (d = 0.19) in favor of males on complex multiple-choice items, an unconventional item type. Males and females also showed sizable differences on Space and Shape items, a domain well documented for showing robust male superiority. Contrary to many previous findings reporting male superiority on multiple-choice items, no measurable difference has been identified on multiple-choice items for both the PISA 2000 and the 2003 math assessments. Reasons for the differential gender performance across math domains and item types were speculated, and directions of future research were discussed.  相似文献   

7.
This article discusses and demonstrates combining scores from multiple-choice (MC) and constructed-response (CR) items to create a common scale using item response theory methodology. Two specific issues addressed are (a) whether MC and CR items can be calibrated together and (b) whether simultaneous calibration of the two item types leads to loss of information. Procedures are discussed and empirical results are provided using a set of tests in the areas of reading, language, mathematics, and science in three grades.  相似文献   

8.
《教育实用测度》2013,26(3):257-275
The purpose of this study was to investigate the technical properties of stem-equivalent mathematics items differing only with respect to response format. Using socio- economic factors to define the strata, a proportional stratified random sample of 1,366 Connecticut sixth-grade students were administered one of three forms. Classical item analysis, dimensionality assessment, item response theory goodness-of-fit, and an item bias analysis were conducted. Analysis of variance and confirmatory factor analysis were used to examine the functioning of the items presented in the three different formats. It was found that, after equating forms, the constructed-response formats were somewhat more difficult than the multiple-choice format. However, there was no significant difference across formats with respect to item discrimination. A differential item functioning (DIF) analysis was conducted using both the Mantel-Haenszel procedure and the comparison of the item characteristic curves. The DIF analysis indicated that the presence of bias was not greatly affected by item format; that is, items biased in one format tended to be biased in a similar manner when presented in a different format, and unbiased items tended to remain so regardless of format.  相似文献   

9.
In contrast to multiple-choice test questions, figural response items call for constructed responses and rely upon figural material, such as illustrations and graphs, as the response medium. Figural response questions in various science domains were created and administered to a sample of 4th-, 8th-, and 12th-grade students. Item and test statistics from parallel sets of figural response and multiple-choice questions were compared. Figural response items were generally more difficult, especially for questions that were difficult (p < .5) in their constructed-response forms. Figural response questions were also slightly more discriminating and reliable than their multiple-choice counterparts, but they had higher omit rates. This article addresses the relevance of guessing to figural response items and the diagnostic value of the item type. Plans for future research on figural response items are discussed.  相似文献   

10.
A thorough search of the literature was conducted to locate empirical studies investigating the trait or construct equivalence of multiple-choice (MC) and constructed-response (CR) items. Of the 67 studies identified, 29 studies included 56 correlations between items in both formats. These 56 correlations were corrected for attenuation and synthesized to establish evidence for a common estimate of correlation (true-score correlations). The 56 disattenuated correlations were highly heterogeneous. A search for moderators to explain this variation uncovered the role of the design characteristics of test items used in the studies. When items are constructed in both formats using the same stem (stem equivalent), the mean correlation between the two formats approaches unity and is significantly higher than when using non-stem-equivalent items (particularly when using essay-type items). Construct equivalence, in part, appears to be a function of the item design method or the item writer's intent.  相似文献   

11.
The main issue addressed in this article is that there is much to learn about students’ knowledge and thinking in science from largescale international quantitative studies beyond overall score measures. Response patterns on individual or groups of items can give valuable diagnostic insight into students’ conceptual understanding, but there is also a danger of drawing conclusions that may be too simple and nonvalid. We discuss how responses to multiple-choice items could be interpreted, and we also show how responses on constructed-response items can be systematised and analysed. Finally, we study, empirically, interactions between item characteristics and student responses. It is demonstrated that even small changes in the item wording and/or the item format may have a substantial influence on the response pattern. Therefore, we argue that interpretations of results from these kinds of studies should be based on a thorough analysis of the actual items used. We further argue that diagnostic information should be an integrated part of the international research aims of such large-scale studies. Examples of items and student responses presented are taken from The Third International Mathematics and Science Study (TIMSS).  相似文献   

12.
In this paper we report the results of an experiment designed to test the hypothesis that when faced with a question involving the inverse direction of a reversible mathematical process, students solve a multiple-choice version by verifying the answers presented to them by the direct method, not by undertaking the actual inverse calculation. Participants responded to an online test containing equivalent multiple-choice and constructed-response items in two reversible algebraic techniques: factor/expand and solve/verify. The findings supported this hypothesis: Overall scores were higher in the multiple-choice condition compared to the constructed-response condition, but this advantage was significantly greater for items concerning the inverse direction of reversible processes compared to those involving direct processes.  相似文献   

13.
Although test scores from similar tests in multiple choice and constructed response formats are highly correlated, equivalence in rankings may mask differences in substantive strategy use. The author used an experimental design and participant think-alouds to explore cognitive processes in mathematical problem solving among undergraduate examinees (N = 64). The study examined the effect of format on mathematics performance and strategy use for male and female examinees given stem-equivalent items. A statistically significant main effect of format on performance was found, with constructed-response items more difficult. The multiple-choice format was associated with more varied strategies, backward strategies, and guessing. Format was found to moderate the effect of problem conceptualization on performance. Results suggest that while for purposes of ranking students on performance, the multiple-choice format may be adequate, for many contemporary educational purposes that seek to provide nuanced information about student cognition, the constructed response format should be preferred.  相似文献   

14.
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of anchor CR item rescoring (known as trend scoring) in the context of classical equating methods. Four linking designs were examined: an anchor with only MC items, a mixed-format anchor test containing both MC and CR items; a mixed-format anchor test incorporating common CR item rescoring; and an equivalent groups (EG) design with CR item rescoring, thereby avoiding the need for an anchor test. Designs using either MC items alone or a mixed anchor without CR item rescoring resulted in much larger bias than the other two designs. The EG design with trend scoring resulted in the smallest bias, leading to the smallest root mean squared error value.  相似文献   

15.
In one study, parameters were estimated for constructed-response (CR) items in 8 tests from 4 operational testing programs using the l-parameter and 2- parameter partial credit (IPPC and 2PPC) models. Where multiple-choice (MC) items were present, these models were combined with the 1-parameter and 3-parameter logistic (IPL and 3PL) models, respectively. We found that item fit was better when the 2PPC model was used alone or with the 3PL model. Also, the slopes of the CR and MC items were found to differ substantially. In a second study, item parameter estimates produced using the IPL-IPPC and 3PL-2PPC model combinations were evaluated for fit to simulated data generated using true parameters known to fit one model combination or ttle other. The results suggested that the more flexible 3PL-2PPC model combination would produce better item fit than the IPL-1PPC combination.  相似文献   

16.
Published discussions of the year-to-year linking of tests comprised of polytomous items appear to suggest that the linking logic traditionally used for multiple-choice items is also appropriate for polytomous items. It is argued and illustrated that a modification of the traditional linking is necessary when tests consist of constructed-response items judged by raters and there is a possibility of year-to-year variation in the rating discrimination and severity.  相似文献   

17.
《教育实用测度》2013,26(1):89-97
Research on the use of multiple-choice tests has presented conflicting evidence about the use of statistical item difficulty as a means of ordering items. An alternate method advocated by many texts is the use of cognitive difficulty. This study examined the effect of using both statistical and cognitive item difficulty in determining item order. Results indicated that those students who received items in an increasing cognitive order, no matter what the order of statistical difficulty, scored higher on hard items. Those students who received the forms with opposing cognitive and statistical difficulty orders scored the highest on medium-level items. The study concludes with a call for more research on the effects of cognitive difficulty and suggests that future studies examine subscores as well as total test results.  相似文献   

18.
This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item properties such as alignment, discrimination, and target range on the knowledge integration scale using a Rasch Partial Credit Model analysis. For instructional validity, we test the sensitivity of multiple-choice and explanation items to knowledge integration instruction using a cohort comparison design. Results show that (1) one third of correct multiple-choice responses are aligned with higher levels of knowledge integration while three quarters of incorrect multiple-choice responses are aligned with lower levels of knowledge integration, (2) explanation items discriminate between high and low knowledge integration ability students much more effectively than multiple-choice items, (3) explanation items measure a wider range of knowledge integration levels than multiple-choice items, and (4) explanation items are more sensitive to knowledge integration instruction than multiple-choice items.  相似文献   

19.
20.
《教育实用测度》2013,26(2):151-169
The use of automated scanning of test sheets, beginning in the 1930s, led to widespread use of the multiple-choice format in standardized testing. New forms of automated scoring now hold out the possibility of making a wide range of constructed-response item formats feasible for use on a large-scale basis. We describe new developments in five domains: mathematical reasoning, algebra problem solving, computer science, architecture, and natural language. For each one, we describe the task as presented to the examinee, the methods used to score the response, and the psychometric properties of the item responses. We then highlight general challenges and issues spanning these technologies. We conclude by offering our views on the ways in which such technologies are likely to shape the future of testing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号