首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

The emerging paradigm of responsible research and innovation (RRI) in the European Commission policy discourse identifies science education as a key agenda for better equipping students with skills and knowledge to tackle complex societal challenges and foster active citizenship in democratic societies. The operationalisation of this broad approach in science education demands, however, the identification of assessment frameworks able to grasp the complexity of RRI process requirements and learning outcomes within science education practice. This article aims to shed light over the application of the RRI approach in science education by proposing a RRI-based analytical framework for science education assessment. We use such framework to review a sample of empirical studies of science education assessments and critically analyse it under the lenses of RRI criteria. As a result, we identify a set of 86 key RRI assessment indicators in science education related to RRI values, transversal competences and experiential and cognitive aspects of learning. We argue that looking at science education through the lenses of RRI can potentially contribute to the integration of metacognitive skills, emotional aspects and procedural dimensions within impact assessments so as to address the complexity of learning.  相似文献   

2.
This article reports on the collaboration of six states to study how simulation‐based science assessments can become transformative components of multi‐level, balanced state science assessment systems. The project studied the psychometric quality, feasibility, and utility of simulation‐based science assessments designed to serve formative purposes during a unit and to provide summative evidence of end‐of‐unit proficiencies. The frameworks of evidence‐centered assessment design and model‐based learning shaped the specifications for the assessments. The simulations provided the three most common forms of accommodations in state testing programs: audio recording of text, screen magnification, and support for extended time. The SimScientists program at WestEd developed simulation‐based, curriculum‐embedded, and unit benchmark assessments for two middle school topics, Ecosystems and Force & Motion. These were field‐tested in three states. Data included student characteristics, responses to the assessments, cognitive labs, classroom observations, and teacher surveys and interviews. UCLA CRESST conducted an evaluation of the implementation. Feasibility and utility were examined in classroom observations, teacher surveys and interviews, and by the six‐state Design Panel. Technical quality data included AAAS reviews of the items' alignment with standards and quality of the science, cognitive labs, and assessment data. Student data were analyzed using multidimensional Item Response Theory (IRT) methods. IRT analyses demonstrated the high psychometric quality (reliability and validity) of the assessments and their discrimination between content knowledge and inquiry practices. Students performed better on the interactive, simulation‐based assessments than on the static, conventional items in the posttest. Importantly, gaps between performance of the general population and English language learners and students with disabilities were considerably smaller on the simulation‐based assessments than on the posttests. The Design Panel participated in development of two models for integrating science simulations into a balanced state science assessment system. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 363–393, 2012  相似文献   

3.
The purpose of this article is to address a major gap in the instructional sensitivity literature on how to develop instructionally sensitive assessments. We propose an approach to developing and evaluating instructionally sensitive assessments in science and test this approach with one elementary life‐science module. The assessment we developed was administered to 125 students in seven classrooms. The development approach considered three dimensions of instructional sensitivity; that is, assessment items should: represent the curriculum content, reflect the quality of instruction, and have formative value for teaching. Focusing solely on the first dimension, representation of the curriculum content, this study was guided by the following research questions: (1) What science module characteristics can be systematically manipulated to develop items that prove to be instructionally sensitive? and (2) Are the instructionally sensitive assessments developed sufficiently valid to make inferences about the impact of instruction on students' performance? In this article, we describe our item development approach and provide empirical evidence to support validity arguments about the developed instructionally sensitive items. Results indicated that: (1) manipulations of the items at different proximities to vary their sensitivity were aligned with the rules for item development and also corresponded with pre‐to‐post gains; and (2) the items developed at different distances from the science module showed a pattern of pre‐to‐post gain consistent with their instructional sensitivity, that is, the closer the items were to the science module, the larger the observed gains and effect sizes. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 691–712, 2012  相似文献   

4.
In this article we describe the challenges associated with assessing complex competencies envisioned as the targets of learning in the 21st century. Such competencies typically reflect the integration of multiple dimensions of knowledge and skill. Technology plays a crucial role in their assessment, from conceptualisation through design, data gathering and interpretation of results. We use the case of science proficiency to illustrate challenges associated with the assessment of the intended products of multidimensional learning and the benefits provided by technology. We frame assessment development as an evidence-centered design process and illustrate it by using cases drawn from middle school science. We then turn to ways in which assessment systems need to evolve to expand the scope of what can be done in the creation and use of valid, reliable and equitable assessments of complex, multidimensional learning. We conclude by discussing policy implications of technology-based assessment systems with an emphasis on measuring what matters versus measuring what is easy, since what we choose to assess will become the focus of instruction. Major advances in assessment policy and practice require investment in the development, validation and deployment of technology-based assessments that reflect the multidimensional competencies identified by contemporary research and theory.  相似文献   

5.
ABSTRACT

In this study, we reviewed 76 journal articles on employing drawing assessment as a research tool in science education. Findings from the systematic review suggest four justifications for using drawing as a type of research tool, including assessment via drawing as (a) an alternative method considering young participants’ verbal or writing abilities, and affective or economic reasons, (b) a unique method that can reveal aspects not easily measured by other methods, (c) a major method that reflects characteristics of science subjects, and (d) a formative assessment to diagnose students’ ideas to benefit their learning. Furthermore, five research trends of studies using drawing as assessment tools are identified, including: (a) students’ conceptions of scientists from the Draw-a-Scientist-Test (DAST) and evolving studies, (b) students’ understanding or mental models of science concepts, (c) participants’ conceptions of science learning or teaching, (d) students’ inquiry abilities and modelling skills via drawing, and (e) technology to support drawing. For each trend, we synthesised and commented on the current findings. A framework conceptualising phases and issues when designing research and instruments employing drawing assessments is proposed. The review provides insights into the design and future direction of research employing drawing assessments in science education.  相似文献   

6.
Science education needs valid, authentic, and efficient assessments. Many typical science assessments primarily measure recall of isolated information. This paper reports on the validation of assessments that measure knowledge integration ability among middle school and high school students. The assessments were administered to 18,729 students in five states. Rasch analyses of the assessments demonstrated satisfactory item fit, item difficulty, test reliability, and person reliability. The study showed that, when appropriately designed, knowledge integration assessments can be balanced between validity and reliability, authenticity and generalizability, and instructional sensitivity and technical quality. Results also showed that, when paired with multiple‐choice items and scored with an effective scoring rubric, constructed‐response items can achieve high reliabilities. Analyses showed that English language learner status and computer use significantly impacted students' science knowledge integration abilities. Students who took the assessment online, which matched the format of content delivery, performed significantly better than students who took the paper‐and‐pencil version. Implications and future directions of research are noted, including refining curriculum materials to meet the needs of diverse students and expanding the range of topics measured by knowledge integration assessments. © 2011 Wiley Periodicals, Inc. J Res Sci Teach 48: 1079–1107, 2011  相似文献   

7.
This study develops a framework to conceptualize the use and evolution of machine learning (ML) in science assessment. We systematically reviewed 47 studies that applied ML in science assessment and classified them into five categories: (a) constructed response, (b) essay, (c) simulation, (d) educational game, and (e) inter-discipline. We compared the ML-based and conventional science assessments and extracted 12 critical characteristics to map three variables in a three-dimensional framework: construct, functionality, and automaticity. The 12 characteristics used to construct a profile for ML-based science assessments for each article were further analyzed by a two-step cluster analysis. The clusters identified for each variable were summarized into four levels to illustrate the evolution of each. We further conducted cluster analysis to identify four classes of assessment across the three variables. Based on the analysis, we conclude that ML has transformed—but not yet redefined—conventional science assessment practice in terms of fundamental purpose, the nature of the science assessment, and the relevant assessment challenges. Along with the three-dimensional framework, we propose five anticipated trends for incorporating ML in science assessment practice for future studies: addressing developmental cognition, changing the process of educational decision making, personalized science learning, borrowing 'good' to advance 'good', and integrating knowledge from other disciplines into science assessment.  相似文献   

8.
Scientific modeling has been advocated as one of the core practices in recent science education policy initiatives. In modeling-based instruction (MBI), students use, construct, and revise models to gain scientific knowledge and inquiry skills. Oftentimes, the benefits of MBI have been documented using assessments targeting students’ conceptual understanding or affective domains. Fewer studies have used assessments directly built on the ideas of modeling. The purpose of this study is to synthesize and examine modeling-oriented assessments (MOA) in the last three decades and propose new directions for research in this area. The study uses a collection of 30 empirical research articles that report MOA from an initial library of 153 articles focusing on MBI in K-12 science education from 1980 to 2013. The findings include the variety of themes within each of the three MOA dimensions (modeling products, modeling practices, and meta-modeling knowledge) and the areas of MOA still in need of much work. Based on the review, three guiding principles are proposed for future work in MOA: (a) framing MOA in an ecology of assessment, (b) providing authentic modeling contexts for assessment, and (c) spelling out the connections between MOA items and the essential aspects of modeling to be assessed.  相似文献   

9.
This article focuses on the design of competency-based performance assessment in e-learning. Though effort has been invested in designing powerful e-learning environments, relatively little attention has been paid to the design of valid and reliable assessments in such environments, leaving many questions to educational developers and teachers. As a solution to this problem, a systematic approach to designing performance assessments in e-learning contexts is presented, partly based on the 4C/ID model. This model enables the construction of realistic whole tasks instead of advocating education that is restricted to more isolated skills. A new assessment procedure also implies an alternative view of instructional design, learning and assessment. The requirements for the learning environment are addressed. Examples from a virtual seminar are presented to illustrate the design approach. The article concludes with the identification of possible pitfalls related to the approach and gives directions for future research.  相似文献   

10.
An Approach for Evaluating the Technical Quality of Interim Assessments   总被引:1,自引:1,他引:0  
Increasing numbers of schools and districts have expressed interest in interim assessment systems to prepare for summative assessments and to improve teaching and learning. However, with so many commercial interim assessments available, schools and districts are struggling to determine which interim assessment is most appropriate to their needs. Unfortunately, there is little research-based guidance to help schools and districts to make the right choice about how to spend their money. Because we realize the urgency of developing criteria that can describe or evaluate the quality of interim assessments, this article presents the results of an initial attempt to create an instrument that school and district educators could use to evaluate the quality and usefulness of the interim assessment. The instrument is designed for use by state and district leaders to help them select an appropriate interim assessment system for their needs, but it could also be used by test vendors looking to evaluate and improve their own systems and by researchers engaged in studies of interim assessment use.  相似文献   

11.
Using a framework of assessment literacy that included principles, tools, and purposes, this study explored the assessment literacy of 11 secondary preservice teachers. Participants?? journals, teaching philosophies, and inquiry-based science units served as data sources. We examined how the preservice teachers understood assessment tools as well as their reasons for using assessment. Additionally, we investigated how the preservice teachers incorporated assessments into inquiry-based science units. Analysis of these documents indicated that preservice teachers recognize the need to align assessments with learning goals and instructional strategies and are using a variety of assessments. They understood several ways to use assessment for learning. However, the inclusion of assessments contained within the science units did not fully align with the views of assessment the preservice teachers presented in their teaching philosophies or journals. Instead of using a variety of assessments that reflect science reforms, the preservice teachers reverted to traditional forms of assessment in their science units. Teacher education programs need to place more emphasis on developing preservice teachers?? assessment literacy so that they are better able to select and implement a variety of appropriate assessments to foster student learning.  相似文献   

12.
Typical assessment systems often measure isolated ideas rather than the coherent understanding valued in current science classrooms. Such assessments may motivate students to memorize, rather than to use new ideas to solve complex problems. To meet the requirements of the Next Generation Science Standards, instruction needs to emphasize sustained investigations, and assessments need to create a detailed picture of students’ conceptual understanding and reasoning processes.

This article describes the design process and potential for automated scoring of 2 forms of inquiry assessment: Energy Stories and MySystem. To design these assessments, we formed a partnership of teachers, discipline experts, researchers, technologists, and psychometricians to align curriculum, assessments, and rubrics. We illustrate how these items document middle school students’ reasoning about energy flow in life science. We used evidence from review by science teachers and experts in the discipline; classroom experiments; and psychometric analysis to validate the assessments, rubrics, and automated scoring.  相似文献   

13.
Recent developments in British higher education have included taking a close look at work‐based learning, in particular its assessment (and its integration within academic programmes of study). However, two questions which are still continuously being asked are (a) to what extent are assessments of work‐based learning valid and reliable, and (b) can they count towards the award of university degrees and diplomas? These questions are becoming increasingly important as there seems to be a growing trend for students to assess their own learning at the workplace (through reflection and analysis and the use of diaries and self‐development journals). This article addresses the above issues by drawing on classical test theory (for an understanding of the fundamentals of validity and reliability) and by examining how the different notions of validity and realiability may be applied in the context of assessments (and self‐assessments) in the workplace. The article concludes that, under certain stated conditions, it is indeed possible to determine whether assessments (and self‐assessments) of work‐based learning are valid, reliable — and comparable.  相似文献   

14.
In recent years, at the same time that performance assessments in science have become more popular, the number of English language learners (ELLs) (i.e., students whose native language is other than English) served by the U.S. educational system has also increased rapidly. While the research base is growing in each of these areas independently, little attention has been paid to their intersection. This case study of the use of a science performance assessment with 96 ELLs in five high school science classes investigated the face, construct, and consequential validity of this intersection. Qualitative and quantitative data analyses showed that both teachers and students had an overall favorable response to the assessment, although students' English comprehension and expression skills were determining factors for certain items. While most responses were reliably scored, ELL spelling and syntax on certain responses were significant sources of error. The degree of specificity of teachers' guidance also significantly affected students' scores. Recommendations from this study include increasing the clarity of an assessment's design, allowing ELLs more time to complete assessments, and scoring by raters who are knowledgeable about typical patterns in written English for this student population. Furthermore, it is recommended that the use of performance assessments with ELLs be exploratory until such time as their validity and reliability with this population can be more adequately established. J Res Sci Teach 34: 721–743, 1997.  相似文献   

15.
States currently are in the process of developing child and family outcome measurement systems for young children with disabilities to meet federal data reporting requirements for the Part C (Infants and Toddlers with Disabilities) and Part B Preschool Grants program supported through the Individuals with Disabilities Education Act. This article reviews issues related to the use of assessments in providing outcome data, discusses challenges raised in conducting valid assessments with young children for accountability purposes, and outlines decisions states must make related to assessment as they design and implement outcome measurement approaches. Considerations related to the standardized or curriculum-based measures are discussed along with other choices related to the use of assessment for accountability.  相似文献   

16.
This article report on the development and use of an analytical framework intended to map the language demands encountered by English learners as they engage in science performance assessments. Grounded in functional and interactional views of language and language use, the authors—two science education researchers and a language scholar—developed the framework via an inductive, iterative, and systematic review of written assessment materials associated with three fifth grade science performance tasks. The resulting Science Assessment Language Demands (SALD) framework is organized around three dimensions: participant structures, communicative modes, and written texts and genres that students are called upon to read and produce. The authors used textual analysis to conduct an expert review of the written documents associated with the three assessment tasks. The results indicate that the framework can be used to document a wide range of functional and interactional language demands involved in science performance assessments. The demands revealed by the SALD framework highlight both potential challenges facing English learners during science performance assessments as well as opportunities afforded by such assessments for demonstrating their knowledge and skills and further developing language proficiency. A major implication of the study is the potential use of the framework to evaluate the language demands and opportunities of science assessments used in classrooms with English learners. © 2010 Wiley Periodicals, Inc. J Res Sci Teach 47: 909–928, 2010  相似文献   

17.
Processes for moderating assessments are much debated in higher education. The myriad approaches to the task vary in their demands on staff time and expertise, and also in how valid, reliable and fair to students they appear. Medical education, with its diverse range of assessments and assessors across clinical and academic domains presents additional challenges to moderation. The current review focuses on medical education, considering double-marking and benchmarking as two broad classes of moderation procedure, and argues that it is the process more than the type of procedure which is crucial for successful moderation. The advantages and disadvantages of each class of procedure are discussed in the light of our medical school’s current practices, and with respect to the limited empirical evidence within medical education assessment. Consideration of implementation is central to ensuring valid and reliable moderation. The reliability of assessor judgements depends more on the consistency of assessment formats and the application of clear and agreed assessment criteria than on the moderation process itself. This article considers these factors in relation to their impact on the reliability of moderation, and aims to help assessors and students appreciate the diversity of these factors by facilitating their consideration in the assessment process.  相似文献   

18.
Given the increased use of performance assessments (PAs) in higher education to evaluate achievement of learning outcomes, it is important to address the barriers related to ensuring quality for this type of assessment. This article presents a design-based research (DBR) study that resulted in the development of a Validity Inquiry Process (VIP). The study’s aim was to support faculty in examining the validity and reliability of the interpretation and use of results from locally developed PAs. DBR was determined to be an appropriate method because it is used to study interventions such as an instructional innovation, type of assessment, technology integration, or administrative activity (Anderson & Shattuck, 2012). The VIP provides a collection of instruments and utilizes a reflective practice approach integrating concepts of quality criteria and development of a validity argument as outlined in the literature (M.T. Kane, 2013; Linn, Baker, & Dunbar, 1991; Messick, 1994).  相似文献   

19.
Increasing the use of learning outcome assessments to inform educational decisions is a major challenge in higher education. For this study we used a sense-making theoretical perspective to guide an analysis of the relationship of information characteristics and faculty assessment knowledge and beliefs with the use of general education assessment information at three research institutions with similar organizational contexts. Study findings indicate that the likelihood of using assessment information increases when assessment evidence is action oriented and viewed as of high quality and when faculty members are knowledgeable, have positive dispositions toward assessment, and have a perception of institutional support for engagement in assessment activities.  相似文献   

20.
Students with the most significant cognitive disabilities (SCD) are the 1% of the total student population who have a disability or multiple disabilities that significantly impact intellectual functioning and adaptive behaviors and who require individualized instruction and substantial supports. Historically, these students have received little instruction in science and the science assessments they have participated in have not included age‐appropriate science content. Guided by a theory of action for a new assessment system, an eight‐state consortium developed multidimensional alternate content standards and alternate assessments in science for students in three grade bands (3–5, 6–8, 9–12) that are linked to the Next Generation Science Standards (NGSS Lead States, 2013 ) and A Framework for K‐12 Science Education (Framework; National Research Council, 2012 ). The great variability within the population of students with SCD necessitates variability in the assessment content, which creates inherent challenges in establishing technical quality. To address this issue, a primary feature of this assessment system is the use of hypothetical cognitive models to provide a structure for variability in assessed content. System features and subsequent validity studies were guided by a theory of action that explains how the proposed claims about score interpretation and use depend on specific assumptions about the assessment, as well as precursors to the assessment. This paper describes evidence for the main claim that test scores represent what students know and can do. We present validity evidence for the assumptions about the assessment and its precursors, related to this main claim. The assessment was administered to over 21,000 students in eight states in 2015–2016. We present selected evidence from system components, procedural evidence, and validity studies. We evaluate the validity argument and demonstrate how it supports the claim about score interpretation and use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号