期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The development of a science process assessment for fourth-grade students

Kathleen A. Smith Paul W. Welliver 《科学教学研究杂志》1990,27(8):727-738

In this study, a multiple-choice test entitled the Science Process Assessment was developed to measure the science process skills of students in grade four. Based on the Recommended Science Competency Continuum for Grades K to 6 for Pennsylvania Schools, this instrument measured the skills of (1) observing, (2) classifying, (3) inferring, (4) predicting, (5) measuring, (6) communicating, (7) using space/time relations, (8) defining operationally, (9) formulating hypotheses, (10) experimenting, (11) recognizing variables, (12) interpreting data, and (13) formulating models. To prepare the instrument, classroom teachers and science educators were invited to participate in two science education workshops designed to develop an item bank of test questions applicable to measuring process skill learning. Participants formed “writing teams” and generated 65 test items representing the 13 process skills. After a comprehensive group critique of each item, 61 items were identified for inclusion into the Science Process Assessment item bank. To establish content validity, the item bank was submitted to a select panel of science educators for the purpose of judging item acceptability. This analysis yielded 55 acceptable test items and produced the Science Process Assessment, Pilot 1. Pilot 1 was administered to 184 fourth-grade students. Students were given a copy of the test booklet; teachers read each test aloud to the students. Upon completion of this first administration, data from the item analysis yielded a reliability coefficient of 0.73. Subsequently, 40 test items were identified for the Science Process Assessment, Pilot 2. Using the test-retest method, the Science Process Assessment, Pilot 2 (Test 1 and Test 2) was administered to 113 fourth-grade students. Reliability coefficients of 0.80 and 0.82, respectively, were ascertained. The correlation between Test 1 and Test 2 was 0.77. The results of this study indicate that (1) the Science Process Assessment, Pilot 2, is a valid and reliable instrument applicable to measuring the science process skills of students in grade four, (2) using educational workshops as a means of developing item banks of test questions is viable and productive in the test development process, and (3) involving classroom teachers and science educators in the test development process is educationally efficient and effective. 相似文献

2.

Development and Application of a Two-Tier Diagnostic Test for High School Students’ Understanding of Flowering Plant Growth and Development

Sheau-Wen?Lin Email author 《International Journal of Science and Mathematics Education》2004,2(2):175-199

This study involved the development and application of a two-tier diagnostic test measuring students understanding of flowering plant growth and development. The instrument development procedure had three general steps: defining the content boundaries of the test, collecting information on students misconceptions, and instrument development. Misconception data were collected from interviews and multiple-choice questions with open response answers. The data were used to develop 13 two-tier multiple-choice items. The conceptual knowledge examined was flowering plant life cycles, reproduction, precondition of germination, plant nutrition, and mechanism for growth and development. The diagnostic instrument was administered to 477 high school students. The correlation coefficient of test-retest was 0.75. Difficulty indices ranged from 0.24 to 0.82, and discrimination indices ranged from 0.32 to 0.65. Results of the Flowering Plant Growth and Development Diagnostic Test suggested that students did not acquire a satisfactory understanding of plant growth and development concepts. Nineteen misconceptions were identified through analysis of the items that could inform biology instruction and resource. 相似文献

3.

A Comparison of Quantitative Questions in Open-Ended and Multiple-Choice Formats

Brent Bridgeman 《Journal of Educational Measurement》1992,29(3):253-271

Open–ended counterparts to a set of items from the quantitative section of the Graduate Record Examination (GRE–Q) were developed. Examinees responded to these items by gridding a numerical answer on a machine-readable answer sheet or by typing on a computer. The test section with the special answer sheets was administered at the end of a regular GRE administration. Test forms were spiraled so that random groups received either the grid-in questions or the same questions in a multiple-choice format. In a separate data collection effort, 364 paid volunteers who had recently taken the GRE used a computer keyboard to enter answers to the same set of questions. Despite substantial format differences noted for individual items, total scores for the multiple-choice and open-ended tests demonstrated remarkably similar correlational patterns. There were no significant interactions of test format with either gender or ethnicity. 相似文献

4.

Ethnic DIF in Reading Tests With Mixed Item Formats

Catherine S. Taylor Yoonsun Lee 《Educational Assessment》2013,18(1):35-68

This article presents a study of ethnic Differential Item Functioning (DIF) for 4th-, 7th-, and 10th-grade reading items on a state criterion-referenced achievement test. The tests, administered 1997 to 2001, were composed of multiple-choice and constructed-response items. Item performance by focal groups (i.e., students from Asian/Pacific Island, Black/African American, Native American, and Latino/Hispanic origins) were compared with the performance of White students using simultaneous item bias and Rasch procedures. Flagged multiple-choice items generally favored White students, whereas flagged constructed-response items generally favored students from Asian/Pacific Islander, Black/African American, and Latino/Hispanic origins. Content analysis of flagged reading items showed that positively and negatively flagged items typically measured inference, interpretation, or analysis of text in multiple-choice and constructed-response formats. Items that were not flagged for DIF generally measured very easy reading skills (e.g., literal comprehension) and reading skills that require higher level thinking (e.g., developing interpretations across texts and analyzing graphic elements). 相似文献

5.

Development of an item bank for assessing generic competences in a higher-education institute: a Rasch modelling approach

Qin Xie Xiaoling Zhong Wen-Chung Wang Cher Ping Lim 《高等教育研究与发展》2014,33(4):821-835

This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social interaction skills and global perspective). The Rasch modelling approach was adopted for instrument development and validation. A total of 425 items were developed. The content validity of these items was examined via six focus group interviews with target students, and the construct validity was verified against data collected from a large student sample (N?=?1151). A matrix design was adopted to assemble the items in 26 test forms, which were distributed at random in each administration session. The results demonstrated that the item bank had high reliability and good construct validity. Cross-sectional comparisons of Years 1–4 students revealed patterns of changes over the years. Correlation analyses shed light on the relationships between the constructs. Implications are drawn to inform future efforts to develop the instrument, and suggestions are made regarding ways to use the instrument to enhance the teaching and learning of generic skills. 相似文献

6.

When Is Reading Also Writing: Sources of Individual Differences on the New Reading Performance Assessments

《Scientific Studies of Reading》2013,17(2):125-151

This research examined component processes that contribute to performance on one of the new, standards-based reading tests that have become a staple in many states. Participants were 60 Grade 4 students randomly sampled from 7 classrooms in a rural school district. The particular test we studied employed a mixture of traditional (multiple-choice) and performance assessment approaches (constructed-response items that required written responses). Our findings indicated that multiple-choice and constructed-response items enlisted different cognitive skills. Writing ability emerged as an important source of individual differences in explaining overall reading ability, but its influence was limited to performance on constructed-response items. After controlling for word identification and listening, writing ability accounted for no variance in multiple-choice reading scores. By contrast, writing ability accounted for unique variance in reading ability, even after controlling for word identification and listening skill, and explained more variance in constructed-response reading scores than did either word identification or listening skill. In addition, performance on the multiple-choice reading measure along with writing ability accounted for nearly all of the reliable variance in performance on the constructed-response reading measure. 相似文献

7.

An analysis of 16–17-year-old students' understanding of solution chemistry concepts using a two-tier diagnostic instrument

Emine Adadan Funda Savasci 《International Journal of Science Education》2013,35(4):513-544

This study focused on the development of a two-tier multiple-choice diagnostic instrument, which was designed and then progressively modified, and implemented to assess students' understanding of solution chemistry concepts. The results of the study are derived from the responses of 756 Grade 11 students (age 16–17) from 14 different high schools who participated in the study. The final version of the instrument included a total of 13 items that addressed the six aspects of solution chemistry, and students' understandings in the test were challenged in multiple contexts with multiple modes and levels of representation. Cronbach alpha reliability coefficients for the content tier and both tiers of the test were found to be 0.697 and 0.748, respectively. Results indicated that a substantial number of students held an inadequate understanding of solution chemistry concepts. In addition, 21 alternative conceptions observed in more than 10% of the students were reported, along with discussion on possible sources of such conceptions. 相似文献

8.

Cross-curricular skills development in final-year dissertation by active and collaborative methodologies

Iñaki Etaio Itziar Churruca Diego Rada Jonatan Miranda Amaia Saracibar Fernando Sarrionandia 《Interactive Learning Environments》2018,26(2):175-188

European Frame for Higher Education has led universities to adapt their teaching schemes. Degrees must train students in competences including specific and cross-curricular skills. Nevertheless, there are important limitations to follow skill improvement through the consecutive academic years. Final-year dissertation (FYD) offers the opportunity to assess these aspects so linked to the professional requirements. The experience reported here offers an alternative methodology for the FYD in order to reinforce cross-curricular skills and substitute the classic final evaluation schemes. A new protocol for the FYD was defined and tested in the Degree in Human Nutrition and Dietetics, with the participation of students and lecturers from different disciplines. The new methodology included collaborative activities that required students active implication and participation. New cross-curricular skills not considered before were included and evaluated in a continuous way: analysis and critical attitude, as well as team working. Obtained data revealed an improvement over cross-curricular skills. Student–student cooperation resulted in significant contributions to enhance FYD quality. The new methodology was satisfactorily valued by students. The main keys for the successful implementation of this protocol were the followings: encouragement of teachers and students, coordination, information and communication technologies, and clear guidelines. 相似文献

9.

Educational Assessment as a Promising Area for Psychometric Research

《教育实用测度》2013,26(3):233-241

Tests of educational achievement typically present items in the multiple-choice format. Some achievement test items may be so "saturated with aptitude" (Willingham, 1980) as to be insensitive to skills acquired through education. Multiple-choice tests are ill-suited for assessing productive thinking and problem-solving skills, skills that often constitute important objectives of education. Viewed as incentives for learning, multiple-choice tests may impede student progress toward these objectives. There is need for accelerated research to develop alternatives to multiple-choice achievement tests, with content selected to match the specified educational objectives. 相似文献

10.

Validating Measurement of Knowledge Integration in Science Using Multiple-Choice and Explanation Items

Hee-Sun Lee Ou Lydia Liu Marcia C. Linn 《教育实用测度》2013,26(2):115-136

This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item properties such as alignment, discrimination, and target range on the knowledge integration scale using a Rasch Partial Credit Model analysis. For instructional validity, we test the sensitivity of multiple-choice and explanation items to knowledge integration instruction using a cohort comparison design. Results show that (1) one third of correct multiple-choice responses are aligned with higher levels of knowledge integration while three quarters of incorrect multiple-choice responses are aligned with lower levels of knowledge integration, (2) explanation items discriminate between high and low knowledge integration ability students much more effectively than multiple-choice items, (3) explanation items measure a wider range of knowledge integration levels than multiple-choice items, and (4) explanation items are more sensitive to knowledge integration instruction than multiple-choice items. 相似文献

11.

An Investigation of Explanation Multiple-Choice Items in Science Assessment

Ou Lydia Liu Hee-Sun Lee Marcia C. Linn 《Educational Assessment》2013,18(3):164-184

Both multiple-choice and constructed-response items have known advantages and disadvantages in measuring scientific inquiry. In this article we explore the function of explanation multiple-choice (EMC) items and examine how EMC items differ from traditional multiple-choice and constructed-response items in measuring scientific reasoning. A group of 794 middle school students was randomly assigned to answer either constructed-response or EMC items following regular multiple-choice items. By applying a Rasch partial-credit analysis, we found that there is a consistent alignment between the EMC and multiple-choice items. Also, the EMC items are easier than the constructed-response items but are harder than most of the multiple-choice items. We discuss the potential value of the EMC items as a learning and diagnostic tool. 相似文献

12.

Asymmetry in student achievement on multiple-choice and constructed-response items in reversible mathematics processes

Christopher J. Sangwin Ian Jones 《Educational Studies in Mathematics》2017,94(2):205-222

In this paper we report the results of an experiment designed to test the hypothesis that when faced with a question involving the inverse direction of a reversible mathematical process, students solve a multiple-choice version by verifying the answers presented to them by the direct method, not by undertaking the actual inverse calculation. Participants responded to an online test containing equivalent multiple-choice and constructed-response items in two reversible algebraic techniques: factor/expand and solve/verify. The findings supported this hypothesis: Overall scores were higher in the multiple-choice condition compared to the constructed-response condition, but this advantage was significantly greater for items concerning the inverse direction of reversible processes compared to those involving direct processes. 相似文献

13.

Effects of study intention and generating multiple choice questions on expository text retention

《Learning and Instruction》2019

Teachers often recommend their students to generate test questions and answers as a means of preparing for an exam. There is a paucity of research on the effects of this instructional strategy. Two recent studies showed positive effects of generating test questions relative to restudy, but these studies did not control for time on task. Moreover, the scarce research available has been limited to the effects of generating open-ended questions. Therefore, the aim of this study was to investigate whether generating multiple-choice test questions would foster retention (as measured by a multiple-choice test) relative to restudy when time would be kept constant across conditions. Using a 2 × 2 design, university students (N = 143) studied a text with the intention of either generating test items or performing well on a test, and then either generated multiple-choice items or restudied the text. Retention was measured by means of a multiple-choice test, both immediately after learning and after a one-week delay. Results showed no effects of study intention. Generating multiple-choice items resulted in lower test performance than restudying the text for the same amount of time. 相似文献

14.

Measurement Efficiency of Innovative Item Formats in Computer-Based Testing

Michael G. Jodoin 《Journal of Educational Measurement》2003,40(1):1-15

The psychometric literature provides little empirical evaluation of examinee test data to assess essential psychometric properties of innovative items. In this study, examinee responses to conventional (e.g., multiple choice) and innovative item formats in a computer-based testing program were analyzed for IRT information with the three-parameter and graded response models. The innovative item types considered in this study provided more information across all levels of ability than multiple-choice items. In addition, accurate timing data captured via computer administration were analyzed to consider the relative efficiency of the multiple choice and innovative item types. As with previous research, multiple-choice items provide more information per unit time. Implications for balancing policy, psychometric, and pragmatic factors in selecting item formats are also discussed. 相似文献

15.

Qualitative and quantitative differences in learning associated with multiple-choice testing

K. Fisher S. Williams J. Roth 《科学教学研究杂志》1981,18(5):449-464

This study assesses some effects of the Computer-Assisted Self-Evaluation (CASE) system of frequent multiple-choice testing with immediate computer feedback; it is part of a larger project aiming to combine the principal strengths of individualized instruction with lecture teaching (Fisher, 1979). Learning and retention are examined in two equivalent groups of undergraduates enrolled in an upper division science course. One group (N =34) received 24 CASE quizzes with immediate feedback and the other (N=30) received two CASE-generated midterms with delayed feedback. Quiz students significantly outperformed Midterm students on the posttest; the Quiz section scored nine percentage points higher on rote items and fourteen points higher on meaningful items. Quiz students also had more positive attitudes toward and were more involved in the course. On a retention test given two years later, the Quiz Group scored eight percentage points higher than the Midterm Group on meaningful items. This study suggests that, contrary to popular opinion, multiple-choice questions promote meaningful learning at least as well as, and possibly better than, rote learning. The CASE system appears to be about as effective as other forms of frequent testing and immediate feedback in enhancing learning, and it provides a simple, cost-effective means of individualized testing in large lecture classes. 相似文献

16.

Exploring Learners’ Conceptual Resources: Singapore A Level Students’ Explanations in the Topic of Ionisation Energy 总被引：1，自引：0，他引：1

Keith S. Taber Kim Chwee Daniel Tan 《International Journal of Science and Mathematics Education》2007,5(3):375-392

This paper describes findings from a study to explore Singapore A-level (Grades 11 and 12, 16–19 yr old) students' understanding of ionisation energy, an abstract and complex topic that is featured in school chemistry courses. Previous research had reported that students in the United Kingdom commonly use alternative notions based on the perceived stability of full shells and the ‘sharing out’ of nuclear force, but that such ideas tend to be applied inconsistently. This paper describes results from the administration of a two-tier multiple-choice instrument, the ionisation energy diagnostic instrument, to find (1) whether A-level students in Singapore have similar ways of thinking about the factors influencing ionisation energy as reported from their A-level counterparts in the UK; and (2) how Singapore A-level students explain the trend of ionisation energy across different elements in Period 3. The results indicate that students in Singapore use the same alternative ideas as those in the UK, and also a related alternative notion. The study also demonstrates considerable inconsistency in the way students responded to related items. The potential significance of the findings to student understanding of complex topics across the sciences is considered. 相似文献

17.

A Comparison of Multiple-Choice and Constructed Figural Response Items

Michael E. Martinez 《Journal of Educational Measurement》1991,28(2):131-145

In contrast to multiple-choice test questions, figural response items call for constructed responses and rely upon figural material, such as illustrations and graphs, as the response medium. Figural response questions in various science domains were created and administered to a sample of 4th-, 8th-, and 12th-grade students. Item and test statistics from parallel sets of figural response and multiple-choice questions were compared. Figural response items were generally more difficult, especially for questions that were difficult (p < .5) in their constructed-response forms. Figural response questions were also slightly more discriminating and reliable than their multiple-choice counterparts, but they had higher omit rates. This article addresses the relevance of guessing to figural response items and the diagnostic value of the item type. Plans for future research on figural response items are discussed. 相似文献

18.

Using Multigroup Confirmatory Factor Analysis to Test Measurement Invariance in Raters: A Clinical Skills Examination Application

Nilufer Kahraman Crystal B. Brown 《教育实用测度》2015,28(4):350-366

Psychometric models based on structural equation modeling framework are commonly used in many multiple-choice test settings to assess measurement invariance of test items across examinee subpopulations. The premise of the current article is that they may also be useful in the context of performance assessment tests to test measurement invariance of raters. The modeling approach and how it can be used for performance tests with less than optimal rater designs are illustrated using a data set from a performance test designed to measure medical students’ patient management skills. The results suggest that group-specific rater statistics can help spot differences in rater performance that might be due to rater bias, identify specific weaknesses and strengths of individual raters, and enhance decisions related to future task development, rater training, and test scoring processes. 相似文献

19.

HIGH SCHOOL STUDENTS’ PROFICIENCY AND CONFIDENCE LEVELS IN DISPLAYING THEIR UNDERSTANDING OF BASIC ELECTROLYSIS CONCEPTS

Ding Teng Sia David F. Treagust A. L. Chandrasegaran 《International Journal of Science and Mathematics Education》2012,10(6):1325-1345

This study was conducted with 330 Form 4 (grade 10) students (aged 15??C?16?years) who were involved in a course of instruction on electrolysis concepts. The main purposes of this study were (1) to assess high school chemistry students?? understanding of 19 major principles of electrolysis using a recently developed 2-tier multiple-choice diagnostic instrument, the Electrolysis Diagnostic Instrument (EDI), and (2) to assess students?? confidence levels in displaying their knowledge and understanding of these electrolysis concepts. Analysis of students?? responses to the EDI showed that they displayed very limited understanding of the electrolytic processes involving molten compounds and aqueous solutions of compounds, with a mean score of 6.82 (out of a possible maximum of 17). Students were found to possess content knowledge about several electrolysis processes but did not provide suitable explanations for the changes that had occurred, with less than 45?% of students displaying scientifically acceptable understandings about electrolysis. In addition, students displayed limited confidence about making the correct selections for the items; yet, in 16 of the 17 items, the percentage of students who were confident that they had selected the correct answer to an item was higher than the actual percentage of students who correctly answered the corresponding item. The findings suggest several implications for classroom instruction on the electrolysis topic that need to be addressed in order to facilitate better understanding by students of electrolysis concepts. 相似文献

20.

Predicting Students’ Skills in the Context of Scientific Inquiry with Cognitive,Motivational, and Sociodemographic Variables

Andreas Nehring Kathrin H. Nowak Annette Upmeier zu Belzen Rüdiger Tiemann 《International Journal of Science Education》2013,35(9):1343-1363

Research on predictors of achievement in science is often targeted on more traditional content-based assessments and single student characteristics. At the same time, the development of skills in the field of scientific inquiry constitutes a focal point of interest for science education. Against this background, the purpose of this study was to investigate to which extent multiple student characteristics contribute to skills of scientific inquiry. Based on a theoretical framework describing nine epistemological acts, we constructed and administered a multiple-choice test that assesses these skills in lower and upper secondary school level (n?=?780). The test items contained problem-solving situations that occur during chemical investigations in school and had to be solved by choosing an appropriate inquiry procedure. We collected further data on 12 cognitive, motivational, and sociodemographic variables such as conceptual knowledge, enjoyment of chemistry, or language spoken at home. Plausible values were drawn to quantify students’ inquiry skills. The results show that students’ characteristics predict their inquiry skills to a large extent (55%), whereas 9 out of 12 variables contribute significantly on a multivariate level. The influence of sociodemographic traits such as gender or the social background becomes non-significant after controlling for cognitive and motivational variables. Furthermore, the performance advance of students from upper secondary school level can be explained by controlling for cognitive covariates. We discuss our findings with regard to curricular aspects and raise the question whether the inquiry skills can be considered as an autonomous trait in science education research. 相似文献