首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Central to the standards-based assessment validation process is an examination of the alignment between state standards and test items. Several alignment analysis systems have emerged recently, but most rely on either traditional rating or matching techniques. Little, if any, analyses have been reported on the degree of consistency between the two methods and on the item and objective characteristics that influence judges' decisions. We randomly assigned judges to either rate item-objective links or match items to objectives while reviewing the 2004 Arizona high school mathematics standards and assessment. Across items we found moderate convergence between methods, and we detected apparent reasons for divergently scored items. We also found that judges relied on item and objective content and intellectual skill features to render decisions. Based on our evidence, we contend that a thorough alignment analysis would involve judges using both rating and matching, while focusing on both content and intellectual skill. The findings have important implications for states when examining the alignment between their standards and assessments.  相似文献   

2.
Assessment influences every level of the education system and is one of the most crucial catalysts for reform in science curriculum and instruction. Teachers, administrators, and others who choose, assemble, or develop assessments face the difficulty of judging whether tasks are truly aligned with national or state standards and whether they are effective in revealing what students actually know. Project 2061 of the American Association for the Advancement of Science has developed and field‐tested a procedure for analyzing curriculum materials, including their assessments, in terms of how well they are likely to contribute to the attainment of benchmarks and standards. With respect to assessment in curriculum materials, this procedure evaluates whether this assessment has the potential to reveal whether students have attained specific ideas in benchmarks and standards and whether information gained from students' responses can be used to inform subsequent instruction. Using this procedure, Project 2061 had produced a database of analytical reports on nine widely used science middle school curriculum materials. The analysis of assessments included in these materials shows that whereas currently available materials devote significant sections in their instruction to ideas included in national standards documents, students are typically not assessed on these ideas. The analysis results described in the report point to strengths and limitations of these widely used assessments and identify a range of good and poor assessment tasks that can shed light on important characteristics of good assessment. © 2002 Wiley Periodicals, Inc. J Res Sci Teach 39: 889–910, 2002  相似文献   

3.
4.
Teacher Work Sample Methodology has been described as an alternative means/set of procedures for assessing teacher effectiveness in producing student learning that are more authentic than traditional means of teacher certification. To investigate the degree to which the methodology aligns with state/national standards, 50 work samples produced by student teachers at Western Oregon University between fall 1991 and spring 1999 were analyzed to determine: (1) the efficiency of Teacher Work Sample Methodology in moving state and national standards, for example, the NCTM standards, into the classroom; and (2) the extent to which Teacher Work Sample Methodology promotes alignment of standards, content, instruction and assessments of instruction. The research found that a majority of the student teacher work samples demonstrated weak alignment or no alignment between stated instructional objectives and selected NCTM Curriculum and Evaluation Standards (Problem Solving, Communication, Reasoning, and Connections). However, in most of the work samples, more than half of the pre/post-assessment methods (performance, knowledge) were aligned with the instructional objectives.  相似文献   

5.
An important part of test development is ensuring alignment between test forms and content standards. One common way of measuring alignment is the Webb (1997, 2007) alignment procedure. This article investigates (a) how well item writers understand components of the definition of Depth of Knowledge (DOK) from the Webb alignment procedure and (b) how consistent their DOK ratings are with ratings provided by other committees of educators across grade levels, content areas, and alternate assessment levels in a Midwestern state alternate assessment system. Results indicate that many item writers understand key features of DOK. However, some item writers struggled to articulate what DOK means and had some misconceptions. Additional analyses suggested some lack of consistency between the item writer DOK ratings and the committee DOK ratings. Some notable differences were found across alternate assessment levels and content areas. Implications for future item writing training and alignment studies are provided.  相似文献   

6.
This article describes an alignment study conducted to evaluate the alignment between Indiana's Kindergarten content standards and items on the Indiana Standards Tool for Alternate Reporting. Alignment is the extent to which standards and assessments are in agreement, working together to guide educators' efforts to support children's learning and development. The alignment process in this study represented a modification of Webb's nationally recognized method of alignment analysis to early childhood assessments and standards. The alignment panel (N = 13) in this study consisted of early childhood educators and educational leaders from all geographic regions of the state. Panel members were asked to rate the depth of knowledge (DOK) stage of each objective in Kindergarten standards; rate the DOK stage for each item on the ISTAR rating scale; and identify the one or two objectives from the standards to which each ISTAR item corresponded. Analysis of the panel's responses suggested the ISTAR inconsistently conformed to Webb's DOK consistency and ROK correspondence criteria for alignment. A promising finding was the strong alignment of the ISTAR Level F1 and F2 scales to the Kindergarten standards. This result provided evidence of the developmental continuum of skills and knowledge that are assessed by the ISTAR items .  相似文献   

7.
This study measured and explored the relationships among elementary mathematics teachers’ skill in (a) determining what an item measures, (b) analyzing student work, (c) providing targeted feedback, and (d) determining next instructional steps. Twenty-three elementary mathematics teachers were randomly assigned to one of three conditions: analyzing items and student responses without rubrics, analyzing items and student responses with rubrics, or analyzing items and student responses with rubrics after watching a professional development program on providing feedback to students. Findings show there is a moderate to strong relationship between teachers’ abilities to analyze student responses to infer what a student knows and can do and their abilities to take action based on that information through either providing the student feedback or making appropriate instructional adaptations. Findings show it was relatively more difficult for teachers to provide feedback that was likely to move students forward in their learning than it was for them to analyze a student's response or to determine next instructional steps. No teacher skill differences associated with the different treatment conditions were found.  相似文献   

8.
The use of content validity as the primary assurance of the measurement accuracy for science assessment examinations is questioned. An alternative accuracy measure, item validity, is proposed. Item validity is based on research using qualitative comparisons between (a) student answers to objective items on the examination, (b) clinical interviews with examinees designed to ascertain their knowledge and understanding of the objective examination items, and (c) student answers to essay examination items prepared as an equivalent to the objective examination items. Calculations of item validity are used to show that selected objective items from the science assessment examination overestimated the actual student understanding of science content. Overestimation occurs when a student correctly answers an examination item, but for a reason other than that needed for an understanding of the content in question. There was little evidence that students incorrectly answered the items studied for the wrong reason, resulting in underestimation of the students' knowledge. The equivalent essay items were found to limit the amount of mismeasurement of the students' knowledge. Specific examples are cited and general suggestions are made on how to improve the measurement accuracy of objective examinations.  相似文献   

9.
This paper describes the multiple choice item development assignment (MCIDA) that was developed to support both content and higher level learning. The MCIDA involves students in higher level learning by requiring them to develop multiple choice items, write justifications for both correct and incorrect answer options and determine the highest cognitive level that the item is testing. The article discusses the benefits and limitations of the scheme and presents data on the largely positive student reactions to the scheme. The development of the MCIDA also serves as an example for how traditional summatively oriented assessment procedures can be developed into tools that directly support student learning.  相似文献   

10.
With the recent adoption of the Common Core standards in many states, there is a need for quality information about textbook alignment to standards. While there are many existing content analysis procedures, these generally have little, if any, validity or reliability evidence. One exception is the Surveys of Enacted Curriculum (SEC), which has been widely used to analyze the alignment among standards, assessments, and teachers’ instruction. However, the SEC can be time‐consuming and expensive when used for this purpose. This study extends the SEC to the analysis of entire mathematics textbooks and investigates whether the results of SEC alignment analyses are affected if the content analysis procedure is simplified. The results indicate that analyzing only every fifth item produces nearly identical alignment results with no effect on the reliability of content analyses.  相似文献   

11.
The alignment of test items to content standards is critical to the validity of decisions made from standards‐based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the content‐matter experts are broadly distributed geographically, panel methods present significant challenges. This article illustrates the use of an online methodology for gauging item alignment that does not require that raters convene in person, reduces the overall cost of the study, increases time flexibility, and offers an efficient means for reviewing large item banks. Latent trait methods are applied to the data to control for between‐rater severity, evaluate intrarater consistency, and provide item‐level diagnostic statistics. Use of this methodology is illustrated with a large pool (1,345) of interim‐formative mathematics test items. Implications for the field and limitations of this approach are discussed.  相似文献   

12.
This study describes the development of an instrument to investigate the extent to which student‐centered actions are occurring in science classrooms. The instrument was developed through the following five stages: (1) student action identification, (2) use of both national and international content experts to establish content validity, (3) refinement of the item pool based on reviewer comments, (4) pilot testing of the instrument, and (5) statistical reliability and item analysis leading to additional refinement and finalization of the instrument. In the field test, the instrument consisted of 26 items separated into four categories originally derived from student‐centered instruction literature and used by the authors to sort student actions in previous research. The SACS was administered across 22 Grade 6–8 classrooms by 22 groups of observers, with a total of 67 SACS ratings completed. The finalized instrument was found to be internally consistent, with acceptable estimates from inter‐rater intraclass correlation reliability coefficients at the p < 0.01 level. After the final stage of development, the SACS instrument consisted of 24 items separated into three categories, which aligned with the factor analysis clustering of the items. Additionally, concurrent validity of the SACS was established with the Reformed Teaching Observation Protocol. Based on the analyses completed, the SACS appears to be a useful instrument for inclusion in comprehensive assessment packages for illuminating the extent to which student‐centered actions are occurring in science classrooms.  相似文献   

13.

Instruments designed to measure teachers’ knowledge for teaching mathematics have been widely used to evaluate the impact of professional development and to investigate the role of teachers’ knowledge in teaching and student learning. These instruments assess a mixture of content knowledge and pedagogical content knowledge. However, little attention has been given to the content alignment between such instruments and curricular standards, particularly in regard to how content knowledge and pedagogical content knowledge items are distributed across mathematical topics. This article provides content maps for two widely used teacher assessment instruments in the USA relative to the widely adopted Common Core State Standards. This common reference enables comparisons of content alignment both between the instruments and between parallel forms within each instrument. The findings indicate that only a small number of items on both instruments are designed to capture teachers’ pedagogical content knowledge and that the majority of these items are focused on curricular topics in the later grades rather than in the early grades. Furthermore, some forms designed for use as pre- and post-assessment of professional development or teacher education are not parallel in terms of curricular topics, so estimates of teachers’ knowledge growth based on these forms may not mean what users assume. The implications of these findings for teacher educators and researchers who use teacher knowledge instruments are discussed.

  相似文献   

14.
In this paper, we report on a study to quantify the impact on student learning and on student assessment literacy of a brief assessment literacy intervention. We first define ‘assessment literacy’ then report on the development and validation of an assessment literacy measurement instrument. Using a pseudo-experimental design, we quantified the impact of an assessment literacy-building intervention on students’ assessment literacy levels and on their subsequent performance on an assessment task. The intervention involved students in the experimental condition analysing, discussing and applying an assessment rubric to actual examples of student work that exemplified extremes of standards of performance on the task (e.g. poor, excellent). Results showed that such a procedure could be expected to impact positively on assessment literacy levels and on student performance (on a similar or related task). Regression analyses indicated that the greatest predictor of enhanced student marks (on the assessment task that was the subject of the experiment), was the development of their ability to judge standards of performance on student work created in response to a similar task. The intervention took just 50 minutes indicating a good educational return on the pedagogical investment.  相似文献   

15.
This study investigated using latent class analysis to set performance standards for assessments comprised of multiple-choice and performance assessment items. Employing this procedure, it is possible to use a sample of student responses to accomplish four goals: (a) determine how well a specified latent structure fits student performance data; (b) determine which latent structure best represents the relationships in the data; (c) obtain estimates of item parameters for each latent class; and (d) identify to which class within that latent structure each response pattern most likely belongs. Comparisons with the Angoff and profile rating methods revealed that the approaches agreed with each other quite well, indicating that both empirical and test-based judgmental approaches may be used for setting performance standards for student achievement.  相似文献   

16.
Inferences about student knowledge, skills, and attributes based on digital activity still largely come from whether students ultimately get a correct result or not. However, the ability to collect activity stream data as individuals interact with digital environments provides information about students’ processes as they progress through learning activities. These data have the potential to yield information about student cognition if methods can be developed to identify and aggregate evidence from diverse data sources. This work demonstrates how data from multiple carefully designed activities aligned to a learning progression can be used to support inferences about students’ levels of understanding of the geometric measurement of area. The article demonstrates evidence identification and aggregation of activity stream data from two different digital activities, responses to traditional assessment items, and ratings based on observation of in-person non-digital activity aligned to a common learning progression using a Bayesian Network approach.  相似文献   

17.
During the development of large‐scale curricular achievement tests, recruited panels of independent subject‐matter experts use systematic judgmental methods—often collectively labeled “alignment” methods—to rate the correspondence between a given test's items and the objective statements in a particular curricular standards document. High disagreement among the expert panelists may indicate problems with training, feedback, or other steps of the alignment procedure. Existing procedural recommendations for alignment reviews have been derived largely from single‐panel research studies; support for their use during operational large‐scale test development may be limited. Synthesizing data from more than 1,000 alignment reviews of state achievement tests, this study identifies features of test–standards alignment review procedures that impact agreement about test item content. The researchers then use their meta‐regression results to propose some practical suggestions for alignment review implementation.  相似文献   

18.
The success of standards-based education systems depends on 2 elements: strong standards, and assessments that measure what the standards expect. States that have or adopt test-based accountability programs claim that their tests are aligned to their standards. But there has been up to now no independent methodology for checking alignment. This article describes and illustrates such a methodology and reports results on a sample of state tests. In general, although individual items align quite well with some standard, the tests as a whole are not well aligned. With few exceptions, the collections of items that make up the tests that we examined do not do a good job of assessing the full range of standards and objectives that states have laid out for their students. This misalignment can have serious consequences for instruction and for the validity of test results.  相似文献   

19.
We examined the degree to which content of states’ writing standards and assessments (using measures of content range, frequency, balance, and cognitive complexity) and their alignment were related to student writing achievement on the 2007 National Assessment of Educational Progress (NAEP), while controlling for student, school, and state characteristics. We found student demographic characteristics had the largest effect on between-state differences in writing performance, followed by state policy-related variables, then state and school covariates. States with writing tests that exhibited greater alignment with the NAEP writing assessment demonstrated significantly higher writing scores. We discuss plausible implications of these findings.  相似文献   

20.
To improve student science achievement in the United States we need inquiry-based instruction that promotes coherent understanding and assessments that are aligned with the instruction. Instead, current textbooks often offer fragmented ideas and most assessments only tap recall of details. In this study we implemented 10 inquiry-based science units that promote knowledge integration and developed assessments that measure student knowledge integration abilities. To measure student learning outcomes, we designed a science assessment consisting of both proximal items that are related to the units and distal items that are published from standardized tests (e.g., Trends in International Mathematics and Science Study). We compared the psychometric properties and instructional sensitivity of the proximal and distal items. To unveil the context of learning, we examined how student, class, and teacher characteristics affect student inquiry science learning. Several teacher-level characteristics including professional development showed a positive impact on science performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号