期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Use of Adjustment by Minimum Discriminant Information in Linking Constructed‐Response Test Scores in the Absence of Common Items

Yi‐Hsuan Lee Shelby J. Haberman Neil J. Dorans 《Journal of Educational Measurement》2019,56(2):452-472

In many educational tests, both multiple‐choice (MC) and constructed‐response (CR) sections are used to measure different constructs. In many common cases, security concerns lead to the use of form‐specific CR items that cannot be used for equating test scores, along with MC sections that can be linked to previous test forms via common items. In such cases, adjustment by minimum discriminant information may be used to link CR section scores and composite scores based on both MC and CR sections. This approach is an innovative extension that addresses the long‐standing issue of linking CR test scores across test forms in the absence of common items in educational measurement. It is applied to a series of administrations from an international language assessment with MC sections for receptive skills and CR sections for productive skills. To assess the linking results, harmonic regression is applied to examine the effects of the proposed linking method on score stability, among several analyses for evaluation. 相似文献

2.

Multiple-choice exams: an obstacle for higher-level thinking in introductory science classes 总被引：1，自引：0，他引：1

KF Stanger-Hall 《CBE life sciences education》2012,11(3):294-306

Learning science requires higher-level (critical) thinking skills that need to be practiced in science classes. This study tested the effect of exam format on critical-thinking skills. Multiple-choice (MC) testing is common in introductory science courses, and students in these classes tend to associate memorization with MC questions and may not see the need to modify their study strategies for critical thinking, because the MC exam format has not changed. To test the effect of exam format, I used two sections of an introductory biology class. One section was assessed with exams in the traditional MC format, the other section was assessed with both MC and constructed-response (CR) questions. The mixed exam format was correlated with significantly more cognitively active study behaviors and a significantly better performance on the cumulative final exam (after accounting for grade point average and gender). There was also less gender-bias in the CR answers. This suggests that the MC-only exam format indeed hinders critical thinking in introductory science classes. Introducing CR questions encouraged students to learn more and to be better critical thinkers and reduced gender bias. However, student resistance increased as students adjusted their perceptions of their own critical-thinking abilities. 相似文献

3.

Comparisons among Designs for Equating Mixed-Format Tests in Large-Scale Assessments

Sooyeon Kim Michael E. Walker Frederick McHale 《Journal of Educational Measurement》2010,47(1):36-53

In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of anchor CR item rescoring (known as trend scoring) in the context of classical equating methods. Four linking designs were examined: an anchor with only MC items, a mixed-format anchor test containing both MC and CR items; a mixed-format anchor test incorporating common CR item rescoring; and an equivalent groups (EG) design with CR item rescoring, thereby avoiding the need for an anchor test. Designs using either MC items alone or a mixed anchor without CR item rescoring resulted in much larger bias than the other two designs. The EG design with trend scoring resulted in the smallest bias, leading to the smallest root mean squared error value. 相似文献

4.

Determining the Anchor Composition for a Mixed-Format Test: Evaluation of Subpopulation Invariance of Linking Functions

Sooyeon Kim Michael Walker 《教育实用测度》2013,26(2):178-195

This study examined the appropriateness of the anchor composition in a mixed-format test, which includes both multiple-choice (MC) and constructed-response (CR) items, using subpopulation invariance indices. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using two types of anchor sets: (a) MC only and (b) a mix of MC and CR. In each anchor condition, the linking functions were also derived separately for males and females, and those subpopulation functions were compared to the total group function. In the MC-only condition, the difference between the subpopulation functions and the total group function was not trivial in a score region that included cut scores, leading to inconsistent pass/fail decisions for low-performing examinees in particular. Overall, the mixed anchor was a better choice than the MC-only anchor to achieve subpopulation invariance between males and females. The research reinforces subpopulation invariance indices as a means of determining the adequacy of the anchor. 相似文献

5.

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Lei Wan George A. Henly 《教育实用测度》2013,26(1):58-78

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats—the figural response (FR) and constructed response (CR) formats used in a K–12 computerized science test. The item response theory (IRT) information function and confirmatory factor analysis (CFA) were employed to address the research questions. It was found that the FR items were similar to the multiple-choice (MC) items in providing information and efficiency, whereas the CR items provided noticeably more information than the MC items but tended to provide less information per minute. The CFA suggested that the innovative formats and the MC format measure similar constructs. Innovations in computerized item formats are reviewed, and the merits as well as challenges of implementing the innovative formats are discussed. 相似文献

6.

Multidimensional Linking for Tests with Mixed Item Types

Lihua Yao Keith Boughton 《Journal of Educational Measurement》2009,46(2):177-197

Numerous assessments contain a mixture of multiple choice (MC) and constructed response (CR) item types and many have been found to measure more than one trait. Thus, there is a need for multidimensional dichotomous and polytomous item response theory (IRT) modeling solutions, including multidimensional linking software. For example, multidimensional item response theory (MIRT) may have a promising future in subscale score proficiency estimation, leading toward a more diagnostic orientation, which requires the linking of these subscale scores across different forms and populations. Several multidimensional linking studies can be found in the literature; however, none have used a combination of MC and CR item types. Thus, this research explores multidimensional linking accuracy for tests composed of both MC and CR items using a matching test characteristic/response function approach. The two-dimensional simulation study presented here used real data-derived parameters from a large-scale statewide assessment with two subscale scores for diagnostic profiling purposes, under varying conditions of anchor set lengths (6, 8, 16, 32, 60), across 10 population distributions, with a mixture of simple versus complex structured items, using a sample size of 3,000. It was found that for a well chosen anchor set, the parameters recovered well after equating across all populations, even for anchor sets composed of as few as six items. 相似文献

7.

Relationships between learning patterns and attitudes towards two assessment formats

Menucha Birenbaum Rose A. Feldman 《Educational research; a review for teachers and all concerned with progress in education》2013,55(1):90-98

The study examined the relationships between learning patterns and attitudes towards two assessment formats: open‐ended (OE) and multiple‐choice (MC), among students in higher education. Sixteen Semantic Differential scales measuring emotional reactions, intellectual reactions and appraisal of each assessment format, along with measures of learning processes, academic self‐concept and test anxiety, were administered to 58 students. Results indicated two patterns of relationships between the learning‐related variables and the assessment attitudes: high scores on the self‐concept measure and on the three measures of learning processes were related to positive attitudes towards the OE format but negative ones towards the MC format; low scores on the test anxiety measures were related to positive attitudes towards the OE format. In addition, significant gender differences emerged with respect to the MC format, with males having more favourable attitudes than females. Results were discussed in light of an adaptive assessment approach. 相似文献

8.

Investigating the Effectiveness of Equating Designs for Constructed-Response Tests in Large-Scale Assessments

Sooyeon Kim Michael E. Walker Frederick McHale 《Journal of Educational Measurement》2010,47(2):186-201

Using data from a large-scale exam, in this study we compared various designs for equating constructed-response (CR) tests to determine which design was most effective in producing equivalent scores across the two tests to be equated. In the context of classical equating methods, four linking designs were examined: (a) an anchor set containing common CR items, (b) an anchor set incorporating common CR items rescored, (c) an external multiple-choice (MC) anchor test, and (d) an equivalent groups design incorporating rescored CR items (no anchor test). The use of CR items without rescoring resulted in much larger bias than the other designs. The use of an external MC anchor resulted in the next largest bias. The use of a rescored CR anchor and the equivalent groups design led to similar levels of equating error. 相似文献

9.

Multiple‐Choice Tests and Student Understanding: What Is the Connection?

Mark G. Simkin William L. Kuechler 《Decision Sciences Journal of Innovative Education》2005,3(1):73-98

Instructors can use both “multiple‐choice” (MC) and “constructed response” (CR) questions (such as short answer, essay, or problem‐solving questions) to evaluate student understanding of course materials and principles. This article begins by discussing the advantages and concerns of using these alternate test formats and reviews the studies conducted to test the hypothesis (or perhaps better described as the hope) that MC tests, by themselves, perform an adequate job of evaluating student understanding of course materials. Despite research from educational psychology demonstrating the potential for MC tests to measure the same levels of student mastery as CR tests, recent studies in specific educational domains find imperfect relationships between these two performance measures. We suggest that a significant confound in prior experiments has been the treatment of MC questions as homogeneous entities when in fact MC questions may test widely varying levels of student understanding. The primary contribution of the article is a modified research model for CR/MC research based on knowledge‐level analyses of MC test banks and CR question sets from basic computer language programming. The analyses are based on an operationalization of Bloom's Taxonomy of Learning Goals for the domain, which is used to develop a skills‐focused taxonomy of MC questions. However, we propose that their analyses readily generalize to similar teaching domains of interest to decision sciences educators such as modeling and simulation programming. 相似文献

10.

Identification-Based Multiple-Choice Assessments in Anatomy can be as Reliable and Challenging as Their Free-Response Equivalents

Jan Douglas-Morris Helen Ritchie Catherine Willis Darren Reed 《Anatomical sciences education》2021,14(3):287-295

Multiple-choice (MC) anatomy “spot-tests” (identification-based assessments on tagged cadaveric specimens) offer a practical alternative to traditional free-response (FR) spot-tests. Conversion of the two spot-tests in an upper limb musculoskeletal anatomy unit of study from FR to a novel MC format, where one of five tagged structures on a specimen was the answer to each question, provided a unique opportunity to assess the comparative validity and reliability of FR- and MC-formatted spot-tests and the impact on student performance following the change of test format to MC. Three successive year cohorts of health science students (n = 1,442) were each assessed by spot-tests formatted as FR (first cohort) or MC (following two cohorts). Comparative question difficulty was assessed independently by three examiners. There were more higher-order cognitive skill questions and more of the course objectives tested in the MC-formatted tests. Spot-test reliability was maintained with Cronbach’s alpha reliability coefficients ≥ 0.80 and 80% of the MC items of high quality (having point-biserial correlation coefficients > 0.25). These results also demonstrated guessing was not an issue. The mean final score for the MC-formatted cohorts increased by 4.9%, but did not change for the final theory examination that was common to all three cohorts. Subgroup analysis revealed that the greatest change in spot-test marks was for the lower-performing students. In conclusion, our results indicate spot-tests formatted as MC are suitable alternatives to FR tests. The increase in mean scores for the MC-formatted spot-tests was attributed to the lower demand of the MC format. 相似文献

11.

Calibration and Scoring of Tests With Multiple-Choice and Constructed-Response Item Types

Kadriye Ercikan Richard D. Sehwarz Marc W. Julian George R. Burket Melba M. Weber Valerie Link 《Journal of Educational Measurement》1998,35(2):137-154

This article discusses and demonstrates combining scores from multiple-choice (MC) and constructed-response (CR) items to create a common scale using item response theory methodology. Two specific issues addressed are (a) whether MC and CR items can be calibrated together and (b) whether simultaneous calibration of the two item types leads to loss of information. Procedures are discussed and empirical results are provided using a set of tests in the areas of reading, language, mathematics, and science in three grades. 相似文献

12.

Achievement Measures of School Effectiveness: Comparison of Model Stability Across Years

《教育实用测度》2013,26(4):353-365

The purpose of this study was to determine the feasibility of combining different test types (criterion-referenced and norm-referenced) in a composite school achievement score to be used in a model for school effectiveness classification. The cross-year stability and within-model consistency of the composite was compared to models using subcomposite, overall scores for both the criterion- referenced and norm-referenced tests, subject-area scores (across grades), grade-level scores, and component scores for each grade. Stability of the different models across 2 years was determined by using the agreement ratio, kappa coefficient, and correlation of residuals (N = 361). The same statistical procedures were used to compute consistency across subsamples (N = 264). Results indicated that transforming and combining student-level scores of different test types, grade levels, and subject areas allows for a broader basis for judging schools and provides a school effectiveness model that is both consistent across subsamples and stable across years. 相似文献

13.

浅析阅读理解考试中的测试方法效应问题

冯悦《广东技术师范学院学报》2007,(6):89-93

本文研究的是不同的测试方法-单项选择和信息转移-是否会在阅读理解考试中产生测试方法效应的问题.除对学生的考试成绩(分数)进行分析外,本研究还进一步对试题的难度值进行了分析,而本研究中试题难度是通过项目反应理论(Item Response Theory)计算得到的.结果显示不同测试方法的确会影响题目难度及考生的考试表现,就试题难度而言信息转移比单项选择更难. 相似文献

14.

A MODEL FOR INCREASING THE MEANING OF STANDARDIZED TEST SCORES1

RICHARD C. COX BARBARA G. STERRETT 《Journal of Educational Measurement》1970,7(4):227-228

相似文献

15.

The Institute for Science Education (Institut für Didaktik der Naturwissenschaften) in Austria

Horst Werner 《International Journal of Science Education》2013,35(4):461-463

English

Cognitive Preference (CP) studies in science education have been met with several criticisms. One of these relates to response‐format i.e. the ipsative nature of the scores, another to linguistic factors, in particular the possible causing of preferences for the Q(uestioning)‐mode by matters unrelated to test modes as operationally defined. The present study used a horizontally split‐half method of testing, converting the first half of the 40‐item CP into an all‐Q format, and leaving the other half in its traditional form (recall, principles, questioning and application). Intercorrelations between both parts were thus non‐ipsative. Comparing subjects’ CP‐scores, mode‐intensities and within‐format correlations little or no differences were found. Moreover within and between‐format correlations turned out to be essentially identical. Factor‐ and SSA‐analyses confirmed the identical structures of both formats. It was concluded that ipsativity does not distort CP‐data unduly, neither does the particular linguistic format of Q‐statements cause substantial deviations from CP‐patterns. 相似文献

16.

Reflection-impulsivity as a predictor of children's academic achievement

D E Barrett 《Child development》1977,48(4):1443-1447

To examine the implications of differences in reflection-impulsivity for later academic achievement, 70 children were administered the Matching Familiar Figures test (MFF) in grade 4 and the Comprehensive Tests of Basic Skills (CTBS) in grades 4, 5, and 6. Children identified as reflective based on grade 4 MFF performance scored significantly higher on the CTBS achievement battery at all grade levels than those classified as impulsive. However, the 2 groups did not differ on the grade 5 or grade 6 achievement measures when scores were adjusted for initial differences in grade 4 CTBS. Similarly, while each of the continuous variables MFF error score and MFF response latency was significantly predictive of grade 5 and grade 6 achievement test scores, neither of the MFF variables significantly improved the prediction of academic performance when current level of achievement was statistically accounted for. Sex differences in the relations between the MFF variables and the achievement measures were identified; MFF error score was more strongly related to later academic achievement for boys than for girls, while MFF response latency was a better predictor of academic achievement for girls than for boys. 相似文献

17.

Effects of a Classroom Management Intervention on Student Achievement in Inner‐City Elementary Schools

H. Jerome Freiberg T. A. Stein Shwu‐yong Huang 《Educational Research and Evaluation》2013,19(1):36-66

ABSTRACT

As part of a study of the life‐cycle of inner‐city schools, the achievement of elementary school students (on MAT6 and TEAMS tests) who had teachers trained in a classroom management program in one school were compared with students in a comparison school during a four‐year period. Students at Madison Elementary School showed statistically greater achievement gains on both nationally normed achievement tests (MAT6) and on state criterion‐referenced achievement battery than students at the comparison school in each of three years. The overall effect size due to program treatment on the MAT6 test scores was large, ranging from .43 (1986–87) and .83 (1987–88) during intervention to .73 (1988–89) after intervention. Similar results were found in the TEAMS test associated with the program intervention with overall effect size of 1.02 (1987–88) and .78 (1988–89) in mathematics, .68 and .77 in reading, and .59 and .77 in writing for the respective years. On measures of learning environment, in a post hoc analysis (1990–91), students at Madison perceived their environment to be significantly more positive than comparison students. Teacher and principal interviews during and after the intervention periods provided contextual guidance for the findings. 相似文献

18.

DIFFERENTIAL WEIGHTING BY JUDGED DEGREE1OF CORRECTNESS

DURGADAS PATNAIK ROSS E. TRAUB 《Journal of Educational Measurement》1973,10(4):281-286

Two conventional scores and a weighted score on a group test of general intelligence were compared for reliability and predictive validity. One conventional score consisted of the number of correct answers an examinee gave in responding to 69 multiple-choice questions; the other was the formula score obtained by subtracting from the number of correct answers a fraction of the number of wrong answers. A weighted score was obtained by assigning weights to all the response alternatives of all the questions and adding the weights associated with the responses, both correct and incorrect, made by the examinee. The weights were derived from degree-of-correctness judgments of the set of response alternatives to each question. Reliability was estimated using a split-half procedure; predictive validity was estimated from the correlation between test scores and mean school achievement. Both conventional scores were found to be significantly less reliable but significantly more valid than the weighted scores. (The formula scores were neither significantly less reliable nor significantly more valid than number-correct scores.) 相似文献

19.

WHICH EXAMINEES ARE MOST FAVOURED BY THE USE OF MULTIPLE CHOICE TESTS?

GLENN L. ROWLEY 《Journal of Educational Measurement》1974,11(1):15-23

Scores were obtained from 198 ninth grade students on achievement motivation, test anxiety, testwiseness, and risktaking. Tests in mathematics and vocabulary were constructed in free response and multiple choice form, and administered to the subjects in that order, with an interval of 5 weeks between administrations. Partial correlations were computed between scores on the multiple choice tests and achievement motivation, test anxiety, testwiseness, and risktaking, with free response scores partialled out. The partial correlations were corrected for the unreliability in the free response scores, and tested for significance. All partials involving achievement motivation and test anxiety were nonsignificant, as were all partials based on mathematics scores. The partial correlations of vocabulary scores with testwiseness and risktaking were significant without exception. It was concluded that the use of multiple choice tests can favour certain examinees those who are highly testwise and willing to take risks in the test situation. It was noted that the extent to which these examinees were favoured was dependent on the nature of the test, and that a verbal test seemed more susceptible than a numerical test. 相似文献

20.

Validating the Conceptions of Assessment-III Scale in Canadian Preservice Teachers

Lia M. Daniels Cheryl Poth Chiara Papile Marnie Hutchison 《Educational Assessment》2013,18(2):139-158

Building on Snow's (1989) idea of 2 pathways to achievement outcomes, a performance and a commitment pathway, this study examined how cognitive and motivational factors associated with each of these pathways, respectively, contributed to the prediction of achievement outcomes in science. The sample consisted of 491 10th- and 11th-grade high school students. Results of hierarchical regression analyses showed that (a) students' cognitive abilities were the strongest predictors of their performance in science as measured by standardized test scores; (b) motivational processes enhanced the predictive validity for science test scores and grades beyond the variance accounted for by ability; and (c) motivational processes were the strongest predictors of students' commitment to science in the form of situational engagement and anticipated choices of science-related college majors and careers. These results are consistent with Snow's (1989) conjecture that both performance and commitment pathway-related factors are necessary for understanding the full range of person-level inputs to achievement outcomes. 相似文献