首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Interpreting and creating graphs plays a critical role in scientific practice. The K-12 Next Generation Science Standards call for students to use graphs for scientific modeling, reasoning, and communication. To measure progress on this dimension, we need valid and reliable measures of graph understanding in science. In this research, we designed items to measure graph comprehension, critique, and construction and developed scoring rubrics based on the knowledge integration (KI) framework. We administered the items to over 460 middle school students. We found that the items formed a coherent scale and had good reliability using both item response theory and classical test theory. The KI scoring rubric showed that most students had difficulty linking graphs features to science concepts, especially when asked to critique or construct graphs. In addition, students with limited access to computers as well as those who speak a language other than English at home have less integrated understanding than others. These findings point to the need to increase the integration of graphing into science instruction. The results suggest directions for further research leading to comprehensive assessments of graph understanding.  相似文献   

2.
In response to the demand for sound science assessments, this article presents the development of a latent construct called knowledge integration as an effective measure of science inquiry. Knowledge integration assessments ask students to link, distinguish, evaluate, and organize their ideas about complex scientific topics. The article focuses on assessment topics commonly taught in 6th- through 12th-grade classes. Items from both published standardized tests and previous knowledge integration research were examined in 6 subject-area tests. Results from Rasch partial credit analyses revealed that the tests exhibited satisfactory psychometric properties with respect to internal consistency, item fit, weighted likelihood estimates, discrimination, and differential item functioning. Compared with items coded using dichotomous scoring rubrics, those coded with the knowledge integration rubrics yielded significantly higher discrimination indexes. The knowledge integration assessment tasks, analyzed using knowledge integration scoring rubrics, demonstrate strong promise as effective measures of complex science reasoning in varied science domains.  相似文献   

3.
This study established a Chinese scale for measuring high school students’ ocean literacy. This included testing its reliability, validity, and differential item functioning (DIF) with the aim of compensating for the lack of DIF tests focusing on current scales. The construct validity and reliability were verified and tested by analyzing the established scale’s items using the Rasch model, and a gender DIF test was conducted to ensure the test results’ fairness when distinct groups were compared simultaneously. The results indicated that the scale established in this study is unidimensional and possesses favorable internal consistency and construct validity. The gender DIF test results indicated that several items were difficult for either female or male students to correctly answer; however, the experts and scholars discussed these items individually and suggested retaining them. The final Chinese version of the ocean literacy scale developed here comprises 48 items that can reflect high school students’ understanding of ocean literacy—which helps students understand the topics of marine science encountered in real life.  相似文献   

4.
Although researchers call for inquiry learning in science, science assessments rarely capture the impact of inquiry instruction. This paper reports on the development and validation of assessments designed to measure middle-school students’ progress in gaining integrated understanding of energy while studying an inquiry-oriented curriculum. The assessment development was guided by the knowledge integration framework. Over 2 years of implementation, more than 4,000 students from 4 schools participated in the study, including a cross-sectional and a longitudinal cohort. Results from item response modeling analyses revealed that: (a) the assessments demonstrated satisfactory psychometric properties in terms of reliability and validity; (b) both the cross-sectional and longitudinal cohorts made progress on integrating their understanding energy concepts; and (c) among many factors (e.g. gender, grade, school, and home language) associated with students’ science performance, unit implementation was the strongest predictor.  相似文献   

5.
6.
This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item properties such as alignment, discrimination, and target range on the knowledge integration scale using a Rasch Partial Credit Model analysis. For instructional validity, we test the sensitivity of multiple-choice and explanation items to knowledge integration instruction using a cohort comparison design. Results show that (1) one third of correct multiple-choice responses are aligned with higher levels of knowledge integration while three quarters of incorrect multiple-choice responses are aligned with lower levels of knowledge integration, (2) explanation items discriminate between high and low knowledge integration ability students much more effectively than multiple-choice items, (3) explanation items measure a wider range of knowledge integration levels than multiple-choice items, and (4) explanation items are more sensitive to knowledge integration instruction than multiple-choice items.  相似文献   

7.
The nature of anatomy education has changed substantially in recent decades, though the traditional multiple‐choice written examination remains the cornerstone of assessing students' knowledge. This study sought to measure the quality of a clinical anatomy multiple‐choice final examination using item response theory (IRT) models. One hundred seventy‐six students took a multiple‐choice clinical anatomy examination. One‐ and two‐parameter IRT models (difficulty and discrimination parameters) were used to assess item quality. The two‐parameter IRT model demonstrated a wide range in item difficulty, with a median of ?1.0 and range from ?2.0 to 0.0 (25th to 75th percentile). Similar results were seen for discrimination (median 0.6; range 0.4–0.8). The test information curve achieved maximum discrimination for an ability level one standard deviation below the average. There were 15 items with standardized loading less than 0.3, which was due to several factors: two items had two correct responses, one was not well constructed, two were too easy, and the others revealed a lack of detailed knowledge by students. The test used in this study was more effective in discriminating students of lower ability than those of higher ability. Overall, the quality of the examination in clinical anatomy was confirmed by the IRT models. Anat Sci Educ 3:17–24, 2010. © 2009 American Association of Anatomists.  相似文献   

8.
To improve student science achievement in the United States we need inquiry-based instruction that promotes coherent understanding and assessments that are aligned with the instruction. Instead, current textbooks often offer fragmented ideas and most assessments only tap recall of details. In this study we implemented 10 inquiry-based science units that promote knowledge integration and developed assessments that measure student knowledge integration abilities. To measure student learning outcomes, we designed a science assessment consisting of both proximal items that are related to the units and distal items that are published from standardized tests (e.g., Trends in International Mathematics and Science Study). We compared the psychometric properties and instructional sensitivity of the proximal and distal items. To unveil the context of learning, we examined how student, class, and teacher characteristics affect student inquiry science learning. Several teacher-level characteristics including professional development showed a positive impact on science performance.  相似文献   

9.
Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content‐specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. The purpose of this module is to describe and illustrate a template‐based method for generating test items. We outline a three‐step approach where test development specialists first create an item model. An item model is like a mould or rendering that highlights the features in an assessment task that must be manipulated to produce new items. Next, the content used for item generation is identified and structured. Finally, features in the item model are systematically manipulated with computer‐based algorithms to generate new items. Using this template‐based approach, hundreds or even thousands of new items can be generated with a single item model.  相似文献   

10.
Although multiple choice examinations are often used to test anatomical knowledge, these often forgo the use of images in favor of text‐based questions and answers. Because anatomy is reliant on visual resources, examinations using images should be used when appropriate. This study was a retrospective analysis of examination items that were text based compared to the same questions when a reference image was included with the question stem. Item difficulty and discrimination were analyzed for 15 multiple choice items given across two different examinations in two sections of an undergraduate anatomy course. Results showed that there were some differences item difficulty but these were not consistent to either text items or items with reference images. Differences in difficulty were mainly attributable to one group of students performing better overall on the examinations. There were no significant differences for item discrimination for any of the analyzed items. This implies that reference images do not significantly alter the item statistics, however this does not indicate if these images were helpful to the students when answering the questions. Care should be taken by question writers to analyze item statistics when making changes to multiple choice questions, including ones that are included for the perceived benefit of the students. Anat Sci Educ 10: 68–78. © 2016 American Association of Anatomists.  相似文献   

11.
School climate surveys are widely applied in school districts across the nation to collect information about teacher efficacy, principal leadership, school safety, students' activities, and so forth. They enable school administrators to understand and address many issues on campus when used in conjunction with other student and staff data. However, these days each district develops the questionnaire according to its own needs and rarely provides supporting evidence for the reliability of items in the scale, that is, whether an individual item contributes significant information to the questionnaire. The Item Response Theory (IRT) is a useful tool that helps examine how much information each item and the whole scale can provide. Our study applied IRT to examine individual items in a school climate survey and assessed the efficiency of the survey after the removal of items that contributed little to the scale. The purpose of this study is to show how IRT can be applied to empirically validate school climate surveys.  相似文献   

12.
Testing organization needs large numbers of high‐quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time‐consuming and expensive because each item is written, edited, and reviewed by a subject‐matter expert. One promising approach that may address this challenge is with automatic item generation. Automatic item generation combines cognitive and psychometric modeling practices to guide the production of items that are generated with the aid of computer technology. The purpose of this study is to describe and illustrate a process that can be used to review and evaluate the quality of the generated item by focusing on the content and logic specified within the item generation procedure. We illustrate our process using an item development example from mathematics drawn from the Common Core State Standards and from surgical education drawn from the health sciences domain.  相似文献   

13.
The Biology Workbench (BW) is a web‐based tool enabling scientists to search a wide array of protein and nucleic acid sequence databases with integrated access to a variety of analysis and modeling tools. The present study examined the development of this scientific tool and its consequent adoption into the context of high school science teaching in the form of the Biology Student Workbench (BSW). Participants included scientists, programmers, science educators, and science teachers who played key roles along the pathway of the design and development of BW, and/or the adaptation and implementation of BSW in high school science classrooms. Participants also included four teachers who, with their students, continue to use BSW. Data sources included interviews, classroom observations, and relevant artifacts. Contrary to what often is advocated as a major benefit accruing from the integration of authentic scientific tools into precollege science teaching, classroom enactments of BSW lacked elements of inquiry and were teacher‐centered with prescribed convergent activities. Students mostly were preoccupied with following instructions and a focus on science content. The desired and actual realizations of BSW fell on two extremes that reflected the disparity between scientists' and educators' views on science, inquiry science teaching, and the related roles of technological tools. Research on large‐scale adoptions of technological tools into precollege science classrooms needs to expand beyond its current focus on teacher knowledge, skills, beliefs, and practices to examine the role of the scientists, researchers, and teacher educators who often are involved in such adoptions. © 2010 Wiley Periodicals, Inc. J Res Sci Teach 48: 37–70, 2011  相似文献   

14.
The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.  相似文献   

15.
The present study investigates the degree to which item "bias" techniques can lead to interpretable results when groups are defined in terms of specified differences in the cognitive processes involved in students' problem-solving strategies. A large group of junior high school students who took a test on subtraction of fractions was divided into two subgroups judged by the rule-space model to be using different problem-solving strategies. It was confirmed by use of Mantel-Haenszel (MH) statistics that these subgroups showed different performances on items with different underlying cognitive tasks. We note that, in our case, we are far from faulting items that show differential item functioning (D1F) between two groups defined in terms of different solution strategies. Indeed, they are "desirable" items, as explained in the discussion section  相似文献   

16.
本研究利用建构图设计一套含有六大部分的30道试题。题型包括拼写题、选择题和简答题。共有175名6到14岁儿童参加了此项考试。Rasch分析结果发现题组内局部题目依赖并不严重。信度为0.85。考题的难度和考生能力的配合度相当良好。我们根据建构图来编写考题,因此有一定程度的内容效度。但有9道题的难度稍微与原先预期略有出入。有5道题不大吻合Rasch模式的预期,没有发现在性别上有明显的项目功能差异。考生能力与学习英语的时间有正相关。最后探讨了基于信息通讯技术的远程计算机自适应测验的技术问题。  相似文献   

17.
The main purpose of this study was to examine the structural relationships between scientific epistemological views (SEVs) and information commitments (ICs) of high school students in Taiwan. Data were collected from 486 Taiwanese high school students via two self‐reporting instruments: one was the SEV questionnaire, including five scales for representing students’ views toward scientific knowledge; and the other was the ICs survey, involving six scales for exploring their evaluative standards and searching strategies of online science information. Structural equation modelling analysis was used to examine the relationships between the aspects of SEVs and ICs. The results of the measurement model confirmed that both the SEVs and ICs instruments had highly satisfactory validity and reliability. The structural equation modelling analysis further indicated that students’ SEVs guided their evaluative standards and searching strategy when dealing with Web‐based science information. For example, students who viewed scientific knowledge as more changeable and tentative significantly tended to adopt a more sophisticated evaluative standard, such as carefully inspecting the content of web sites for judging the usefulness. The findings in general suggested that students with more constructivist‐oriented SEVs might develop more advanced standards and searching strategy toward online scientific information to derive great benefit from Web‐based environments. Consequently, the role of SEVs should be highlighted as increasingly metacognitive engagement with online science information.  相似文献   

18.
Croatian 1st‐year and 3rd‐year high‐school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear measures: (a) an item‐difficulty measure based upon all responses, (b) an item‐confidence measure based upon correct student answers, and (c) an item‐confidence measure based upon incorrect student answers. Comparisons were made with regard to item difficulty and item confidence. The results suggest that Newtonian dynamics is a topic with stronger students' alternative conceptions than the topic of DC circuits, which is characterized by much lower students' confidence on both correct and incorrect answers. A systematic and significant difference between mean student confidence on Newtonian dynamics and DC circuits items was found in both student groups. Findings suggest some steps for physics instruction in Croatia as well as areas of further research for those in science education interested in additional techniques of exploring alternative conceptions. © 2005 Wiley Periodicals, Inc. J Res Sci Teach 43: 150–171, 2006  相似文献   

19.
The purpose of this study was to investigate primary students’ learning through participation in an out‐of‐school enrichment programme, held in a science centre, which focused on DNA and genes and whether participation in the programme led to an increased understanding of inheritance as well as promoted interest in the topic. The sample consisted of two groups (245 students in the experimental group and 150 students in the control group) of upper primary students (Grade 5) from six schools in Singapore. Two instruments were developed—a 15‐item multiple‐choice test to measure learning gains and a 17‐item survey form to measure student feedback. Pre‐, post‐, and delayed post‐tests were administered. Results showed statistically significant gains in learning for the experimental group that appeared to be stable as well as high levels of interest stimulated by the programme.  相似文献   

20.
The use of content validity as the primary assurance of the measurement accuracy for science assessment examinations is questioned. An alternative accuracy measure, item validity, is proposed. Item validity is based on research using qualitative comparisons between (a) student answers to objective items on the examination, (b) clinical interviews with examinees designed to ascertain their knowledge and understanding of the objective examination items, and (c) student answers to essay examination items prepared as an equivalent to the objective examination items. Calculations of item validity are used to show that selected objective items from the science assessment examination overestimated the actual student understanding of science content. Overestimation occurs when a student correctly answers an examination item, but for a reason other than that needed for an understanding of the content in question. There was little evidence that students incorrectly answered the items studied for the wrong reason, resulting in underestimation of the students' knowledge. The equivalent essay items were found to limit the amount of mismeasurement of the students' knowledge. Specific examples are cited and general suggestions are made on how to improve the measurement accuracy of objective examinations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号