首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
This article examines three typical approaches to alternate assessment for students with significant cognitive disabilities—portfolios, performance assessments, and rating scales. A detailed analysis of common and unique design features of these approaches is provided, including features of each approach that influence the psychometric quality of their results. Validity imperatives for alternate assessments are reviewed, and approaches for addressing the need for validity evidence are outlined. The article concludes with an examination of three technical challenges—alignment, scores and scoring, and standard setting—common to all alternate assessments. In light of these challenges, existing methods and professional testing standards are endorsed as necessary guidance for understanding and advancing alternate assessment practices.  相似文献   

2.
Students with the most significant cognitive disabilities (SCD) are the 1% of the total student population who have a disability or multiple disabilities that significantly impact intellectual functioning and adaptive behaviors and who require individualized instruction and substantial supports. Historically, these students have received little instruction in science and the science assessments they have participated in have not included age‐appropriate science content. Guided by a theory of action for a new assessment system, an eight‐state consortium developed multidimensional alternate content standards and alternate assessments in science for students in three grade bands (3–5, 6–8, 9–12) that are linked to the Next Generation Science Standards (NGSS Lead States, 2013 ) and A Framework for K‐12 Science Education (Framework; National Research Council, 2012 ). The great variability within the population of students with SCD necessitates variability in the assessment content, which creates inherent challenges in establishing technical quality. To address this issue, a primary feature of this assessment system is the use of hypothetical cognitive models to provide a structure for variability in assessed content. System features and subsequent validity studies were guided by a theory of action that explains how the proposed claims about score interpretation and use depend on specific assumptions about the assessment, as well as precursors to the assessment. This paper describes evidence for the main claim that test scores represent what students know and can do. We present validity evidence for the assumptions about the assessment and its precursors, related to this main claim. The assessment was administered to over 21,000 students in eight states in 2015–2016. We present selected evidence from system components, procedural evidence, and validity studies. We evaluate the validity argument and demonstrate how it supports the claim about score interpretation and use.  相似文献   

3.
4.
This paper reports the results of the National Survey of Accommodations and Alternate Assessments for Students who are Deaf or Hard of Hearing in the United States (National Survey). This study focused on the use of accommodations and alternate assessments in statewide assessments used with students who are deaf or hard of hearing. A total of 258 participants responded to the survey, including 32 representing schools for the deaf, 168 from districtwide/school programs, and 58 from mainstreamed settings. These schools and programs served a total of nearly 12,000 students who are deaf or hard of hearing nationwide. The most prevalent accommodations used in 2003-2004 statewide standardized assessments in mathematics and reading were extended time, an interpreter for directions, and a separate room for test administration. Read aloud and signed question-response accommodations were often prevalent, used more often for mathematics than in reading assessments. Participants from mainstreamed settings reported a more frequent use of accommodations than those in schools for the deaf or districtwide/school programs. In contrast, schools for the deaf were most likely to have students participate in alternate assessments. The top three alternate assessment formats used across all settings were out-of-level testing, work samples, and portfolios. Using the National Survey results as a starting point, future research will need to investigate the validity of accommodations used with students who are deaf or hard of hearing. In the context of the No Child Left Behind Act of 2001 accountability policies, the accommodations and alternate assessment formats used with students who are deaf or hard of hearing may result in restrictions in how scores are integrated into state accountability frameworks.  相似文献   

5.
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources of error, which are characteristic of alternate assessments. Instead, Generalizability theory (G-theory) allows researchers to identify sources of error and analyze the relative contribution of each source. This study demonstrates an application of G-theory to examine reliability for an alternate assessment. A G-study with the facets rater type, assessment attempts, and tasks was examined to determine the relative contribution of each to observed score variance. Results were used to determine the reliability of scores. The assessment design was modified to examine how changes might impact reliability. As a final step, designs that were deemed satisfactory were evaluated regarding the feasibility of adapting them into a statewide standardized assessment and accountability program.  相似文献   

6.
Number of raters is theoretically central to peer assessment reliability and validity, yet rarely studied. Further, requiring each student to assess more peers’ documents both increases the number of evaluations per document but also assessor workload, which can decline performance. Moreover, task complexity is likely a moderating factor, influencing both workload and validity. This study examined whether changing the number of required peer assessments per student / number of raters per document affected peer assessment reliability and validity for tasks at different levels of task complexity. 181 students completed and provided peer assessments for tasks at three levels of task complexity: low complexity (dictation), medium complexity (oral imitation), and high complexity (writing). Adequate validity of peer assessments was observed for all three task complexities at low reviewing loads. However, the impacts of increasing reviewing load varied by reliability vs. validity outcomes and by task complexity.  相似文献   

7.
This study examines the impact of an assessment training module on student assessment skills and task performance in a technology-facilitated peer assessment. Seventy-eight undergraduate students participated in the study. The participants completed an assessment training exercise, prior to engaging in peer-assessment activities. During the training, students reviewed learning concepts, discussed marking criteria, graded example projects and compared their evaluations with the instructor’s evaluation. Data were collected in the form of initial and final versions of students’ projects, students’ scoring of example projects before and after the assessment training, and written feedback that students provided on peer projects. Results of data analysis indicate that the assessment training led to a significant decrease in the discrepancy between student ratings and instructor rating of example projects. In addition, the degree of student vs. instructor discrepancy was highly predictive of the quality of feedback that students provided to their peers and the effectiveness of revisions that they made to their own projects upon receiving peer feedback. Smaller discrepancies in ratings were associated with provision of higher quality peer feedback during peer assessment, as well as better revision of initial projects after peer assessment.  相似文献   

8.
The assessment of quality in higher education from the perspective of students has three dimensions: students’ assessment of teaching, students’ satisfaction and students’ learning engagement. These differ in conceptions of quality, evaluation methods, evaluation content, evaluation purposes, traits and priorities. The authors conducted three rounds of empirical investigations to study higher education quality assessment from students’ perspective and concluded that students play multiple roles in higher education evaluation and assessment, all of which can be improved by strengthening students’ objectivity and participation, evaluating the added value of a college education oriented to student development and taking the students’ perspective as an important way to contribute to higher education quality enhancement, assurance and control, and make proper use of higher education evaluations and assessments.  相似文献   

9.
Since the 2001–02 school year, the accountability provisions of the No Child Left Behind Act (NCLB) have shaped much of the work of public school teachers and administrators in the United States. NCLB explicitly prohibits schools from excluding students with disabilities from the accountability system and requires not only participation of all students in statewide accountability assessments but also reporting of the results for students with disabilities along with other students and as a disaggregated group. From the beginning of these requirements, lawmakers recognized that there would be a small group of students with disabilities for whom the regular assessment, even with accommodations, would not be appropriate and they authorized states to develop an alternate assessment based on alternate achievement standards (AA-AAS) for this group of students. More recently, responding to pressures from the field, additional flexibility has been granted to develop an additional alternate assessment based on modified grade-level achievement standards (AA-MAS) for students with disabilities who present with persistent academic difficulties. It is expected that approximately 2% of the total student population might be included in this new alternate assessment. This article examines the decisions that need to be made by individual states to determine the target population for this new alternate assessment and the policy implications of these decisions.  相似文献   

10.
The relationships between ratings on the Idaho Alternate Assessment (IAA) for 116 students with significant disabilities and corresponding ratings for the same students on two norm-referenced teacher rating scales were examined to gain evidence about the validity of resulting IAA scores. To contextualize these findings, another group of 54 students who had disabilities, but were not officially eligible for the alternate assessment also was assessed. Evidence to support the validity of the inferences about IAA scores was mixed, yet promising. Specifically, the relationship among the reading, language arts, and mathematics achievement level ratings on the IAA and the concurrent scores on the ACES-Academic Skills scales for the eligible students varied across grade clusters, but in general were moderate. These findings provided evidence that IAA scales measure skills indicative of the state's content standards. This point was further reinforced by moderate to high correlations between the IAA and Idaho State Achievement Test (ISAT) for the not eligible students. Additional evidence concerning the valid use of the IAA was provided by logistic regression results that the scores do an excellent job of differentiating students who were eligible from those not eligible to participate in an alternate assessment. The collective evidence for the validity of the IAA scores suggests it is a promising assessment for NCLB accountability of students with significant disabilities. The methods of establishing this evidence have the potential to advance validation efforts of other states' alternate assessments.  相似文献   

11.
Nebraska districts use different strategies for measuring student performance on the state's content standards. District assessments differ in type and technical quality. Six quality criteria were endorsed by the state. These criteria cover content and curricular validity, fairness, and appropriateness of score interpretations. District assessment portfolios document how well assessments meet these criteria. Districts receive ratings on how well their assessments meet each of the quality criteria and are given a rating from Unacceptable to Exemplary. This article presents these technical quality criteria and explains how they are (a) individually rated and (b) combined for the district's overall quality rating.  相似文献   

12.
ABSTRACT

Building on the papers in this special issue, this article uses modern conceptions of validity theory to provide a framework for considering the evaluation of teaching quality. The 3 facets of teaching quality focused on are domain conceptualization, evidence and inferences, and their evaluation. Domain definitions vary in their specificity with tradeoffs in their range of applicability and specificity of inference. Evidence collection can range from highly standardized assessments to observations that must attend to evidence from a myriad of classroom interactions. For all assessments, however, even the most standardized, different interpretations of assessment tasks can threaten the validity of score interpretations. The papers consider a range of processes that are designed to generate, support, and interrogate the validity of inferences based on assessment scores. A fundamental question underlying this type of measurement is whether differences in the quality of teaching that students experience can be causally attributed to the teacher.  相似文献   

13.
Student assessment of teaching in higher education   总被引:1,自引:0,他引:1  
Plans to introduce campus-wide assessments of college or university teaching which are largely dependent on student ratings are seen as a threat to academic freedom in those institutions with little or no experience of this form of evaluation. While regular student evaluations of teaching are very common in North America, their introduction is only now being considered in colleges and universities in a number of other countries. Research on the reliability and validity of student ratings indicate that they are capable of providing valuable information about the quality of teaching. Depending on the survey used, this type of evaluation may be used to provide evidence of teaching ability to staffing committees or to suggest ways of improving teaching. The paper concludes with a set of recommendations for higher education institutions which are considering the regular assessment of all teachers by their students.  相似文献   

14.
This study examined the validity of students’ evaluations of teaching as an instrument for measuring teaching quality by examining the effects of likability and prior subject interest as potential biasing effects, measured at the beginning of the course and at the time of evaluation. University students (N = 260) evaluated psychology courses in one semester at a German university with a standardized questionnaire, yielding 517 data points. Cross-classified multilevel analyses revealed fixed effects of likability at both times of measurement and fixed effects of prior subject interest measured at the beginning of the course. Likability seems to exert a substantial bias on student evaluations of teaching, albeit one that is overestimated when measured at the time of evaluation. In contrast, prior subject interest seems to introduce a weak bias. Considering that likability bears no conceptual relationship to teaching quality, these findings point to a compromised validity of students’ evaluations of teaching.  相似文献   

15.
The emphasis on scientific inquiry has increased the importance in developing the fundamental abilities to conduct scientific investigations and urged a need for valid assessments of students' inquiry abilities. We took advantage of the advanced technology to develop a simulation-based assessment of inquiry abilities (SAIA) that allowed students to generate scientific explanations and demonstrate their experimental abilities. This paper describes the validation of the assessment. Data were collected from 48 12th-grade students at a local high school who were categorized into three groups based on their program majors. Both quantitative and qualitative approaches were utilized to validate SAIA. The quantitative results showed that SAIA was aligned with a validated reasoning-skill test (criterion-related validity), discriminated variance among different groups (construct validity), and was highly suitable for examining inquiry abilities (content validity). Additionally, we utilized the think-aloud technique in order to identify the performances exhibited by students while they accomplished the SAIA tasks. The protocol analysis indicated that in general, students demonstrated the expected abilities in SAIA and that their SAIA scores accurately reflected their performance levels of inquiry abilities. The results suggested that SAIA was a valid assessment for evaluating the inquiry abilities of high school students. This study also provided systemic strategies for validating simulation-based assessments.  相似文献   

16.
The inter-rater reliability of university students’ evaluations of teaching quality was examined with cross-classified multilevel models. Students (N = 480) evaluated lectures and seminars over three years with a standardised evaluation questionnaire, yielding 4224 data points. The total variance of these student evaluations was separated into the variance components of courses, teachers, students and the student/teacher interaction. The substantial variance components of teachers and courses suggest reliability. However, a similar proportion of variance was due to students, and the interaction of students and teachers was the strongest source of variance. Students’ individual perceptions of teaching and the fit of these perceptions with the particular teacher greatly influence their evaluations. This casts some doubt on the validity of student evaluations as indicators of teaching quality and suggests that aggregated evaluation scores should be used with caution.  相似文献   

17.
Because of the unique nature of the students eligible for alternate assessments based on modified academic achievement standards, their varied access to the general education curriculum, and their unique learning needs, innovative psychometric thinking and practice is needed to assure high technical quality of alternate assessments. Indeed, we at least must marshal state-of-the-art procedures to secure strong psychometric evidence to support appropriate and meaningful design and use of these important assessments. The authors contributing work to this special issue, Alternate Assessments Based on Modified Academic Achievement Standards, address important issues and provide guidance to policymakers, test developers, and educators. They also each raise important technical quality issues. This article offers a brief review of such psychometric considerations, in light of the work and comments of the special issue authors.  相似文献   

18.
School psychologists are increasingly engaged in service provisions for students eligible for special education services under the eligibility category of autism, including conducting school‐based assessments and evaluations. Evaluations occur for a variety of reasons such as special education eligibility decision‐making, treatment and intervention planning, and progress monitoring. Publications in school psychology journals emphasizing autism spectrum disorder (ASD) assessment and evaluation are vital to quality training and practitioner utilization of quality practices. In the current study, researchers conducted a systematic review of publicaftions in 10 school psychology journals from 2007 to 2017 to assess the current state of ASD assessment and evaluation research in the field of school psychology. Implications for researchers, trainers, and practitioners are discussed.  相似文献   

19.
Mechanisms for the quality assessment of teaching in the higher education systems of the UK, The Netherlands, France and Germany give varying statuses to students’ assessment of teaching, specifically that done by means of questionnaires. Despite numerous assertions of the general validity of many aspects of such assessments, previous research — very little of which has been based upon the European experience — has nevertheless shown various biases in these evaluations (biases being defined as aspects of evaluation unrelated to the intrinsic characteristics of the teaching). It is also possible to hypothesise other sources of bias that are not analysed in depth in the existing literature; some of these may be specific to the higher education systems of individual countries. The possible existence of biases must necessarily entail some problems in the interpretation of questionnaire results and thus dilemmas in their application to decision‐making by institutions.

  相似文献   


20.
A sample of 293 local district assessments used in the Nebraska STARS (School-based Teacher-led Assessment and Reporting System), 147 from 2004 district mathematics assessment portfolios and 146 from 2003 reading assessment portfolios, was scored with a rubric evaluating their quality. Scorers were Nebraska educators with background and training in assessment. Raters reached an agreement criterion during a training session; however, analysis of a set of 30 assessments double-scored during the main scoring session indicated that the math ratings remained reliable during scoring, while the reading ratings did not. Therefore, this article presents results for the 147 mathematics assessments only. The quality of local mathematics assessments used in the Nebraska STARS was good overall. The majority were of high quality on characteristics that go to validity (alignment with standards, clarity to students, appropriateness of content). Professional development for Nebraska teachers is recommended on aspects of assessment related to reliability (sufficiency of information and scoring procedures).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号