首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A sample of 293 local district assessments used in the Nebraska STARS (School-based Teacher-led Assessment and Reporting System), 147 from 2004 district mathematics assessment portfolios and 146 from 2003 reading assessment portfolios, was scored with a rubric evaluating their quality. Scorers were Nebraska educators with background and training in assessment. Raters reached an agreement criterion during a training session; however, analysis of a set of 30 assessments double-scored during the main scoring session indicated that the math ratings remained reliable during scoring, while the reading ratings did not. Therefore, this article presents results for the 147 mathematics assessments only. The quality of local mathematics assessments used in the Nebraska STARS was good overall. The majority were of high quality on characteristics that go to validity (alignment with standards, clarity to students, appropriateness of content). Professional development for Nebraska teachers is recommended on aspects of assessment related to reliability (sufficiency of information and scoring procedures).  相似文献   

2.
Item stem formats can alter the cognitive complexity as well as the type of abilities required for solving mathematics items. Consequently, it is possible that item stem formats can affect the dimensional structure of mathematics assessments. This empirical study investigated the relationship between item stem format and the dimensionality of mathematics assessments. A sample of 671 sixth-grade students was given two forms of a mathematics assessment in which mathematical expression (ME) items and word problems (WP) were used to measure the same content. The effects of mathematical language and reading abilities in responding to ME and WP items were explored using unidimensional and multidimensional item response theory models. The results showed that WP and ME items appear to differ with regard to the underlying abilities required to answer these items. Hence, the multidimensional model fit the response data better than the unidimensional model. For the accurate assessment of mathematics achievement, students’ reading and mathematical language abilities should also be considered when implementing mathematics assessments with ME and WP items.  相似文献   

3.
Value-added approaches aim to make fair comparisons of the academic progress of pupils in different settings. They rest on measurements at two time points and it has been suggested that the assessment of 4-year-olds cannot provide a sufficiently strong base from which to make comparisons at the end of Key Stage 1. Data on a range of potential predictors from the Performance Indicators in Primary Schools (PIPS) are related to measures of reading and mathematics in Year 2. The questions addressed include: a) To what extent is it possible to predict the reading of 7-year-olds? b) Which measures of 4-year-olds are the most predictive of later success? c) What are the implications for baseline assessment nationally? The results suggest that on-entry assessments, which last about 20 minutes, can predict the reading and mathematics scores of seven-year-olds with correlations around 0.7. Several measures are identified as potential predictors. The quickest to administer reliably include digit and letter identification, counting, writing and doing simple sums. The analysis suggests that value-added measures are possible for schools but that individual pupil predictions remain problematic.  相似文献   

4.
This paper reports the results of the National Survey of Accommodations and Alternate Assessments for Students who are Deaf or Hard of Hearing in the United States (National Survey). This study focused on the use of accommodations and alternate assessments in statewide assessments used with students who are deaf or hard of hearing. A total of 258 participants responded to the survey, including 32 representing schools for the deaf, 168 from districtwide/school programs, and 58 from mainstreamed settings. These schools and programs served a total of nearly 12,000 students who are deaf or hard of hearing nationwide. The most prevalent accommodations used in 2003-2004 statewide standardized assessments in mathematics and reading were extended time, an interpreter for directions, and a separate room for test administration. Read aloud and signed question-response accommodations were often prevalent, used more often for mathematics than in reading assessments. Participants from mainstreamed settings reported a more frequent use of accommodations than those in schools for the deaf or districtwide/school programs. In contrast, schools for the deaf were most likely to have students participate in alternate assessments. The top three alternate assessment formats used across all settings were out-of-level testing, work samples, and portfolios. Using the National Survey results as a starting point, future research will need to investigate the validity of accommodations used with students who are deaf or hard of hearing. In the context of the No Child Left Behind Act of 2001 accountability policies, the accommodations and alternate assessment formats used with students who are deaf or hard of hearing may result in restrictions in how scores are integrated into state accountability frameworks.  相似文献   

5.
Although assessments of mathematics, reading, and writing are assumed to measure distinct academic skills, this may be difficult owing to the pervasive influence of general ability on performance. Factor analyses of school-level data from 14 large-scale assessment programs revealed that 80% of the variance in mathematics, reading, and writing scores was due to a common, underlying factor. Multiple regression analyses confirmed that scores contribute little information that is unique to a particular subject (6% or less). Although different assessments may create the illusion of providing unique information, they may be tapping into generic cognitive abilities that cut across content areas. These results raise suspicions about the value and validity of interpretations based on school-level subject area scores.  相似文献   

6.
The purpose of this study is to evaluate the relationship of mathematics calculation rate (curriculum-based measurement of mathematics; CBM-M), reading rate (curriculum-based measurement of reading; CBM-R), and mathematics application and problem solving skills (mathematics screener) among students at four levels of proficiency on a statewide test. It was hypothesized that CBM-M provides insufficient information to make good screening decisions and that other measures with content more similar to that of large-scale tests of mathematics would function to improve screening. One hundred and seventy students in third grade from a rural elementary school in the Midwestern United States participated. Structural equation modeling was used to evaluate direct, mediator, and latent growth models. In general, CBM-R mediated the relationship between the mathematics ability screener and passing the state assessment, while CBM-M did not have any significant paths within these models. Results are discussed in terms of the utility of CBM-M and CBM-R procedures in screening for success on state test performance in mathematics.  相似文献   

7.
Many educational testing programs report examinee performance at more than two levels of proficiency. Whether these assessments have the capacity to support these multiple inferences, though, is a topic that has not been widely discussed. This study proposes a method for evaluating the minimum number of measurement opportunities for reporting students' performance at multiple achievement levels and describes an application of the method for reading and mathematics assessments that are used by some school districts in Nebraska. Analyses were based on judgments collected from 110 teachers about characteristics of items and tasks from multiple assessments in reading and mathematics at grades 4 and 8, and in high school. Results suggested that there were generally enough items on the mathematics assessments to classify students into two or three performance levels, but rarely enough to make the four classifications that the state reported. Items on the reading assessments were generally distributed across the proficiency levels and tended to allow reporting for all four classification levels. These findings have implications for both practitioners and policymakers in how scores are interpreted.  相似文献   

8.
Recent developments in curriculum and assessment in the UK have led to increased involvement of teachers in high‐stakes summative assessment of their own students. Case studies of experienced secondary school mathematics teachers reading and assessing General Certificate of Secondary Education (GCSE) coursework texts show that they experience tensions between their various roles and responsibilities as teachers and as examiners. Moreover, different teachers appear to resolve these tensions in different ways, adopting various positions in relation both to the content of the written coursework texts and to the student‐authors of the texts. Variations in teachers’ approaches to reading and assessing mathematics coursework may lead to differences in the ranks or grades allocated and, even where they do not, the meanings of the grades given by different teachers may be substantially different. Through examining teachers’ assessment practices, questions are raised both about the validity of such assessments and about the compatibility of the coursework examination system with the aims of the curriculum development which gave rise to it.  相似文献   

9.
Given the relatively high intercorrelations observed between mathematics achievement, reading achievement, and cognitive ability, it has recently been claimed that student assessment studies (e.g., TIMSS, PISA) and intelligence tests measure a single cognitive ability that is practically identical to general intelligence. The present article uses three lines of reasoning to show that the outcomes of schooling can and must be conceptually distinguished from the intelligence construct. First, the conceptual differences between student assessments and tests of cognitive ability are delineated. Second, results from construct validation studies providing strong empirical support for the multidimensionality of the achievement measures applied in large-scale educational assessments are reported. Third, data supporting the differential development of educational outcomes in different domains are presented.  相似文献   

10.
《教育实用测度》2013,26(2):173-185
More attention is being given to evaluating the quality of school-level assessment scores due to their importance for school-based planning and monitoring effectiveness. In this study, cross-year stability is proposed as an indicator of data quality and the degree of stability that is appropriate for large-scale assessments of student performance is explored. Following a search of Internet sites, Year 1 to Year 2 stability coefficients were calculated for assessment data from 21 states and 2 provinces. The median stability coefficient was .78 in mathematics and reading, but coefficients for writing were generally lower. A stability coefficient of .80 is recommended as the standard for large-scale assessments of student performance. A high degree of cross-year stability makes it easier to detect and attribute changes in school-level scores to school improvement efforts. The link between stability and reliability and several factors that may attenuate stability are discussed.  相似文献   

11.
This special edition of IJMSE focuses on the Programme of International Student Assessment (PISA) project now that it has completed a full cycle of administration—reading, mathematics, and science—to look at ways in which PISA has been used in participating countries and with what consequences, and to identify potential research and policy directions emanating from this initiative. Articles were invited to (a) reflect international perspectives on the uses and consequences of PISA to date and (b) speculate on future directions for research, curriculum, and policy using the PISA datasets. The introductory article provides a brief overview of common aspects of PISA: Evolving definitions of reading literacy, mathematics literacy, and science literacy; technical design of the instruments and data analysis procedures; the changing emphasis of administrations; and recent research using the datasets. PISA, unlike other international assessments in reading, mathematics, and science, has provided a fresh perspective on ‘what might be’ by decoupling the assessment from mandated curricula to focus on literacies needed for a 21st century economy. This unique feature of PISA brings with it possibilities and cautions for policy makers.  相似文献   

12.
This study examines the 1992 National Curriculum assessment data from one large LEA in England in order to address the issue of equity. For comparison purposes we also present additional data obtained front the same sample of pupils on an NFER standardised word recognition test. The report focuses on the relative performance of gender, low income, linguistic, and special needs groups on a standardised reading test and the teacher (TA) and standard task (ST) performance assessments administered in 1992 to 7‐year‐olds as part of the national curriculum (NC) in England and Wales. The impact of schools and teacher effectiveness on student attainments scores is also examined and discussed. Briefly, the findings show that irrespective of the method of assessment, differences in attainment were found between most pupil groups investigated. However, importantly, only very modest evidence was found that particular methods of assessment appeared either to reduce or increase the differences in attainment and overall no clear patterns emerged. The findings are discussed in the context of various factors that may have an impact on the assessment of student attainment.  相似文献   

13.
Abstract

Position effects (PE) cause decreasing probabilities of correct item responses towards the end of a test. We analysed PEs in science, mathematics and reading tests administered in the German extension to the PISA 2006 study with respect to their variability at the student- and school-level. PEs were strongest in reading and weakest in mathematics. Variability in PEs was found at both levels of analysis. PEs were stronger for male students, for students with a migration background (science and mathematics), and for students with a less favourable socio-economic background (reading). At the school level, PEs were stronger in lower school tracks and in schools with a high proportion of students with a migration background. The relationships of the test scores with the covariates partly reflected the covariates’ relationships with PEs. Our findings suggest that PEs should be taken seriously in large-scale assessments as they have an undesirable impact on the results.  相似文献   

14.
Similar to educators in mathematics, science, and reading, history educators around the world have mobilized curricular reform movements toward including complex thinking in history education, advancing historical thinking, developing historical consciousness, and teaching competence in historical sense making. These reform movements, including the Common Core Standards, are beginning to include historical thinking. Despite these developments, inclusion of historical thinking in assessments has been slow: The great majority of history assessments, both large-scale and classroom-based, still focus on fragmented pieces of information. In this article, we discuss the challenges in assessment of historical thinking, describe how these issues were dealt with in a 1-hr test of students ability to reason about “enemy aliens” in Canada during World War I, and make recommendations for future assessments.  相似文献   

15.
《Educational Assessment》2013,18(3):195-224
Three mathematics scoring methods are being used or explored in large-scale assessment programs: item-by-item scoring, holistic scoring, and "trait" scoring. This study investigated all 3 methods of scoring on 3 mathematics performance-based assessments. Mathematics assessment tasks were selected from a pool of pilot tasks because they could be scored using all 3 methods. Results of the study suggest that holistic scoring and item-by-item scoring methods provide similar information; however, trait score for conceptual understanding and mathematics communication tapped into different aspects of student performance. Implications for the validity of scoring methods now in use for performance-based mathematics assessments are discussed.  相似文献   

16.
The No Child Left Behind Act of 2001 (NCLB) emphasizes educational accountability for all students. Twenty-eight states have policies to aggregate student participation and proficiency data for schools for the deaf in NCLB reports. The remaining states account for these students in other ways: referring student data to "sending" schools and aggregating data to the district or state level are most prominent. In reports of student assessment results for academic year 2002-2003, three schools for the deaf made "Adequate Yearly Progress" under NCLB: These schools demonstrated at least a 95% participation rate in assessments, and at least 95% of their students met or surpassed state proficiency benchmarks in reading and mathematics. Proficiency levels for other schools varied by report, but were often comparable to those of students with disabilities. Challenges and strategies for capturing the impact of NCLB accountability policies on deaf students are discussed.  相似文献   

17.
Phonological awareness is a key factor in the development of literacy, and frequently presents itself as an area of weakness in pupils with reading difficulties. In this article, Anies Al-Hroub of the American University of Beirut sets out to define a distinguishing pattern of characteristics that supports the identification of pupils with specific learning difficulties who are gifted in mathematics and reports the assessment of the pupils' visual and auditory perceptual skills, including phonological awareness. The assessments were designed to measure auditory and visual memory skills, auditory and visual analysis skills, speed of information processing and spoken language (receptive and expressive). Furthermore, aspects of language learning such as reading, writing, spelling and parts of listening ability were all assessed for mathematically gifted pupils with specific learning difficulties who scored above the cut-off score of 120 on the WISC-III-Jordan. The article closes with recommendations for further research.  相似文献   

18.
Students who are deaf or hard of hearing (SDHH) often need accommodations to participate in large-scale standardized assessments. One way to bridge the gap between the language of the test (English) and a student's linguistic background (often including American Sign Language [ASL]) is to present test items in ASL. The specific aim of this project was to measure the effects of an ASL accommodation on standardized test scores for SDHH in reading and mathematics. A total of 64 fifth- to eighth-grade (ages 10-15) SDHH from schools for the deaf in the United States participated in this study. There were no overall differences in the mean percent of items students scored correctly in the standard vs. ASL-accommodated conditions for reading or mathematics. We then conducted hierarchical linear regression analyses to analyze whether measures of exposure to ASL (home and classroom) and student proficiency in the subject area predicted student performance in ASL-accommodated assessments. The models explained up to half of the variance in the scores, with subject area proficiency (mathematics or reading) as the strongest predictor. ASL exposure was not significant with the exception of ASL classroom instruction as a predictor of mathematics scores.  相似文献   

19.
Federal policy on alternate assessment based on modified academic achievement standards (AA-MAS) inspired this research. Specifically, an experimental study was conducted to determine whether tests composed of modified items would have the same level of reliability as tests composed of original items, and whether these modified items helped reduce the performance gap between AA-MAS eligible and ineligible students. Three groups of eighth-grade students (N?=?755) defined by eligibility and disability status took original and modified versions of reading and mathematics tests. In a third condition, the students were provided limited reading support along with the modified items. Changes in reliability across groups and conditions for both the reading and mathematics tests were determined to be minimal. Mean item difficulties within the Rasch model were shown to decrease more for students who would be eligible for the AA-MAS than for non-eligible groups, revealing evidence of differential boost. Exploratory analyses indicated that shortening the question stem may be a highly effective modification, and that adding graphics to reading items may be a poor modification.  相似文献   

20.
In recent years, there has been increased interest in improving early mathematics curricula and instruction. Subsequently, there has also been a rise in demand for better early mathematics assessments, as most current measures are limited in their content and/or their sensitivity to detect differences in early mathematics development among young children. In this article, using data from two large samples of diverse populations of prekindergarten and kindergarten children, we provide evidence regarding the psychometric validity of a new theory-based early mathematics assessment. The new measure is the short form of a longer, validated measure. Our results suggest the short form assessment is valid for assessing prekindergarten and kindergarten children’s numeracy and geometry skills and is sensitive to differences in early mathematics development among young children.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号