首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 16 毫秒
1.
《教育实用测度》2013,26(3):309-333
A taxonomy of 31 multiple-choice item-writing guidelines was validated through a logical process that included two sources of evidence: the consensus achieved from reviewing what was found in 27 textbooks on educational testing and the results of 27 research studies and reviews published since 1990. This taxonomy is mainly intended for classroom assessment. Because textbooks have potential to educate teachers and future teachers, textbook writers are encouraged to consider these findings in future editions of their textbooks. This taxonomy may also have usefulness for developing test items for large-scale assessments. Finally, research on multiple-choice item writing is discussed both from substantive and methodological viewpoints.  相似文献   

2.
《教育实用测度》2013,26(1):51-78
A taxonomy of 43 multiple-choice item-writing rules was developed on the basis of an analysis of 46 references in the educational measurement literature. In this comparative review, the results of 96 theoretical and empirical studies were analyzed to determine if support existed for each rule. In some instances, rules were revised, and for nearly one half the rules, no research was found.  相似文献   

3.
Many efforts have been made to determine and explain differential gender performance on large-scale mathematics assessments. A well-agreed-on conclusion is that gender differences are contextualized and vary across math domains. This study investigated the pattern of gender differences by item domain (e.g., Space and Shape, Quantity) and item type (e.g., multiple-choice i iIn this paper, two kinds of multiple-choice items are discussed: traditional multiple-choice items and complex multiple-choice items. A sample complex multiple choice item is shown in Table 6. The terms “multiple-choice” and “traditional multiple-choice” are used interchangeably to refer to the traditional multiple choice items throughout the paper, while the term “complex multiple-choice” is used to refer to the complex multiple-choice items. Raman K. Grover is now an Independent Psychometrician. items, open constructed-response items). The U.S. portion of the Programme for International Student Assessment (PISA) 2000 and 2003 mathematics assessment was analyzed. A multidimensional Rasch model was used to provide student ability estimates for each comparison. Results revealed a slight but consistent male advantage. Students showed the largest gender difference (d = 0.19) in favor of males on complex multiple-choice items, an unconventional item type. Males and females also showed sizable differences on Space and Shape items, a domain well documented for showing robust male superiority. Contrary to many previous findings reporting male superiority on multiple-choice items, no measurable difference has been identified on multiple-choice items for both the PISA 2000 and the 2003 math assessments. Reasons for the differential gender performance across math domains and item types were speculated, and directions of future research were discussed.  相似文献   

4.
《教育实用测度》2013,26(3):167-180
In the figural response item format, proficiency is expressed by manipulating elements of a picture or diagram. Figural response items in architecture were contrasted with multiple-choice counterparts in their ability to predict architectural problem-solving proficiency. Problem-solving proficiency was measured by performance on two architecture design problems, one of which involved a drawing component, whereas the other required only a written verbal response. Both figural response and multiple-choice scores predicted verbal design problem solving, but only the figural response scores predicted graphical problem solving. The presumed mechanism for this finding is that figural response items more closely resemble actual architectural tasks than do multiple-choice items. Some evidence for this explanation is furnished by architects' self-reports, in which architects rated figural response items as "more like what an architect does" than multiple-choice items.  相似文献   

5.
6.
This paper describes an item response model for multiple-choice items and illustrates its application in item analysis. The model provides parametric and graphical summaries of the performance of each alternative associated with a multiple-choice item; the summaries describe each alternative's relationship to the proficiency being measured. The interpretation of the parameters of the multiple-choice model and the use of the model in item analysis are illustrated using data obtained from a pilot test of mathematics achievement items. The use of such item analysis for the detection of flawed items, for item design and development, and for test construction is discussed.  相似文献   

7.
《教育实用测度》2013,26(3):233-241
Tests of educational achievement typically present items in the multiple-choice format. Some achievement test items may be so "saturated with aptitude" (Willingham, 1980) as to be insensitive to skills acquired through education. Multiple-choice tests are ill-suited for assessing productive thinking and problem-solving skills, skills that often constitute important objectives of education. Viewed as incentives for learning, multiple-choice tests may impede student progress toward these objectives. There is need for accelerated research to develop alternatives to multiple-choice achievement tests, with content selected to match the specified educational objectives.  相似文献   

8.
为了使测试结果接近或等于真分数,根据语言测试发展的“个人化、真实化和过程化”的新要求,以及目前多项选择测试的不足,新型的多项选择测试应该具有动态题目设计、合理分值计算和电脑辅助测试等特征。  相似文献   

9.
Both multiple-choice and constructed-response items have known advantages and disadvantages in measuring scientific inquiry. In this article we explore the function of explanation multiple-choice (EMC) items and examine how EMC items differ from traditional multiple-choice and constructed-response items in measuring scientific reasoning. A group of 794 middle school students was randomly assigned to answer either constructed-response or EMC items following regular multiple-choice items. By applying a Rasch partial-credit analysis, we found that there is a consistent alignment between the EMC and multiple-choice items. Also, the EMC items are easier than the constructed-response items but are harder than most of the multiple-choice items. We discuss the potential value of the EMC items as a learning and diagnostic tool.  相似文献   

10.
教学目标研究是基础教育的核心课题,上世纪中叶布卢姆等建立的认知目标分类存在着行为主义倾向以及自身层次性等问题。本报告以数学学科为例,对上海市青浦全地区同年级学生的分水平测量结果,采用因素分析方法,析取其内隐的主因素,揭示原目标分类的弊病,构建了四层次分类的框架,并由此对该地区17年前后学生的认知水平作多角度的数据对比,提出"分析水平"徘徊不前等现状,为深化教学改革提供依据。  相似文献   

11.
王皓璇  王之光 《培训与研究》2007,24(5):133-134,F0003
“二十四节气”是中国古代科学文化的重要内容,许多传统节庆都与之相关。在权威词典的基础上,探讨了在跨文化交流中翻译“节气”和“清明(节)”的翻译策略和语用效果。  相似文献   

12.
This study involved the development and application of a two-tier diagnostic test measuring students understanding of flowering plant growth and development. The instrument development procedure had three general steps: defining the content boundaries of the test, collecting information on students misconceptions, and instrument development. Misconception data were collected from interviews and multiple-choice questions with open response answers. The data were used to develop 13 two-tier multiple-choice items. The conceptual knowledge examined was flowering plant life cycles, reproduction, precondition of germination, plant nutrition, and mechanism for growth and development. The diagnostic instrument was administered to 477 high school students. The correlation coefficient of test-retest was 0.75. Difficulty indices ranged from 0.24 to 0.82, and discrimination indices ranged from 0.32 to 0.65. Results of the Flowering Plant Growth and Development Diagnostic Test suggested that students did not acquire a satisfactory understanding of plant growth and development concepts. Nineteen misconceptions were identified through analysis of the items that could inform biology instruction and resource.  相似文献   

13.
Problem-solving strategy is frequently cited as mediating the effects of response format (multiple-choice, constructed response) on item difficulty, yet there are few direct investigations of examinee solution procedures. Fifty-five high school students solved parallel constructed response and multiple-choice items that differed only in the presence of response options. Student performance was videotaped to assess solution strategies. Strategies were categorized as "traditional"–those associated with constructed response problem solving (e.g., writing and solving algebraic equations)–or "nontraditional"–those associated with multiple-choice problem solving (e.g., estimating a potential solution). Surprisingly, participants sometimes adopted nontraditional strategies to solve constructed response items. Furthermore, differences in difficulty between response formats did not correspond to differences in strategy choice: some items showed a format effect on strategy but no effect on difficulty; other items showed the reverse. We interpret these results in light of the relative comprehension challenges posed by the two groups of items.  相似文献   

14.
布卢姆的分类学开创了教育目标分类学研究的先河,影响了20世纪的教育。在当代教育体系中,"分类而教"已成为教学设计者的共识。2001修订版的布氏分类学沿袭了1956版的基本思路,对各要素进行了重新思考,强调有意义的学习,有着较强的科学性和实用性,是布氏分类的重大发展。而与2001修订版产生于同期的马扎诺分类,则以更为独特的视角,从人的行为模式出发,以人的意识控制程度作为依据,将人类学习活动的自我、元认知、认知和知识四大系统纳入一个统一的系统,构筑了一种层次分明而又合为一体的教育目标分类学。新分类力图打破布氏分类的"框架"局限,致力于"理论"的构建。他的分类理论反映了信息社会主流的知识观和以学习者为中心的教育理念。追求统一的心理学基础,提升了学习者学习过程中元认知的地位与作用,并把"自我"作为教育目标分类的最高层次,具有开拓性的意义,突破了以往教育目标分类"要素模式"的局限,丰富了分类学研究的内涵,体现了对原有的"布氏框架"的超越。  相似文献   

15.
This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item properties such as alignment, discrimination, and target range on the knowledge integration scale using a Rasch Partial Credit Model analysis. For instructional validity, we test the sensitivity of multiple-choice and explanation items to knowledge integration instruction using a cohort comparison design. Results show that (1) one third of correct multiple-choice responses are aligned with higher levels of knowledge integration while three quarters of incorrect multiple-choice responses are aligned with lower levels of knowledge integration, (2) explanation items discriminate between high and low knowledge integration ability students much more effectively than multiple-choice items, (3) explanation items measure a wider range of knowledge integration levels than multiple-choice items, and (4) explanation items are more sensitive to knowledge integration instruction than multiple-choice items.  相似文献   

16.
We consider the relationship between the multiple-choice and free-response sections on the Computer Science and Chemistry tests of the College Board's Advanced Placement program. Restricted factor analysis shows that the free-response sections measure the same underlying proficiency as the multiple-choice sections for the most part. However, there is also a significant, if relatively small, amount of local dependence among the free-response items that produces a small degree of multidimensionauty for each test  相似文献   

17.
External written examinations are commonly used for determining student academic achievement. The influence of question type and cognitive process on examination performance in senior-secondary physical education is unclear. A secondary data analysis of Victorian Certificate of Education (VCE) Physical Education examination data (2011; n?=?9,323, 2012; n?=?8,781) was conducted. Question type (multiple choice and short answer) and overall examination performance were compared and the predictive value of question type, cognitive process (based on Bloom’s revised taxonomy), and overall examination scores determined. In 2011 and 2012, students performed significantly better on multiple-choice questions; however, short-answer performance better predicted overall exam performance. A significant difference between marks achieved by cognitive level and grade (Ungraded [UG] – A+) was found. Low-achieving students (UG – D) were performing well below the examination mean across all questions. Developing higher order thinking skills for all students may lead to improved overall examination performance in VCE Physical Education.  相似文献   

18.
通过利用Gitest III软件,作者对湖南城市学院英语专业2009年1月听力过关测试全部客观题进行了较为详细的项目分析,检测了试题的信度和效度,并针对分析结果提出在设计听力测试客观题方面应注意的问题及建议.  相似文献   

19.
We determined the recommendations for preparing and scoring constructed-response (CR) test items in 25 sources (textbooks and chapters) on educational and psychological measurement. The project was similar to Haladyna's (2004) Haladyna, T. M. 2004. Developing and validating multiple-choice test items , 3rd, Mahwah, NJ: Erlbaum. [Crossref] [Google Scholar] analysis for multiple-choice items. We identified 12 recommendations for preparing CR items given by multiple sources, with 4 of these given by at least half of the sources; and 13 recommendations for scoring CR items given by multiple sources, with 5 given by at least half of the sources. Many recommendations received minority support or were unique to individual sources. Research is needed both on the effect of the recommendations for measurement properties and the extent to which the recommendations are adopted in practice.  相似文献   

20.
The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students’ written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students’ normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号