首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Selection decisions have a major impact on our education, occupation, and quality of life, and the role of standardized tests in selection has always been a source of controversy. Here, I consider various definitions of fairness in measurement and selection—those emerging from within educational measurement and statistics, those from philosophy, and finally, those from the public. I use examples of public challenges to selection practices to illustrate the fact that technical and philosophical definitions of fairness do not align well with public concerns. I emphasize the importance of promoting awareness of existing standards, advocating for the fair use of testing and selection practices, and communicating in a candid and straightforward way when engaging with test takers and test users.  相似文献   

2.
The Standards for Educational and Psychological Testing have evolved in the breadth and depth of coverage of issues in educational testing and measurement since their first publication in 1954. There were a number of substantive changes in the 1999 revision that addressed validity, fairness, accommodations, and compliance with the Standards. In addition, there was nearly a 50% increase in the number of standards contained in the last revision. The next revision of the Standards may be initiated in 2007 and there are remaining concerns about access and awareness by non-measurement professionals, compliance by test publishers and users, relevance in addressing mandates for accountability, and substantive areas of educational assessment. This review of major changes to the Standards and discussion of future topics is designed to inform the next revision.  相似文献   

3.
Although standardized tests have been in use for years, there is a lack of consensus about what constitutes appropriate student preparation for testing. Popham (1991) demonstrated that teachers and administrators view preparation in different ways and noted that there is considerable diversity of opinion about which practices are appropriate and inappropriate. Other researchers have attempted to create standards or guidelines for determining appropriate testing practice, but these do not appear to capture the diversity of teacher-initiated preparation. Do teachers and testing specialists see preparation in the same way? What practices fall into the grey area of being not appropriate but not necessarily unethical? This study examines eight categories (40 practices) of preparation or teacher intervention to maximize student test performance. Teachers (N = 42) and testing specialists (N = 10) were asked to examine practices and determine how appropriate or inappropriate the practices were for a specified test. Results show that teachers consistently rate practices to be more appropriate than do testing specialists. Significant differences between teachers and specialists were found for six of the eight categories of preparation. Practices such as motivational activities, pretest interventions, same format preparation, and previous form preparation were perceived to be less evident regarding the appropriateness of their use by teachers in the classroom. This article concludes with a call for test developers and school district representatives to collaborate to determine the appropriateness of testing practice for local needs and a recognition that concrete and widely disseminated guidelines for testing practices are needed for a variety of tests and instructional decisions.  相似文献   

4.
Over the past few decades, those who take tests in the United States have exhibited increasing diversity with respect to native language. Standard psychometric procedures for ensuring item and test fairness that have existed for some time were developed when test‐taking groups were predominantly native English speakers. A better understanding of the potential influence that insufficient language proficiency may have on the efficacy of these procedures is needed. This paper represents a first step in arriving at this better understanding. We begin by addressing some of the issues that arise in a context in which assessments in a language such as English are taken increasingly by groups that may not possess the language proficiency needed to take the test. For illustrative purposes, we use the first‐language status of a test taker as a surrogate for language proficiency and describe an approach to examining how the results of fairness procedures are affected by inclusion or exclusion of those who report that English is not their first language in the fairness analyses. Furthermore, we explore the sensitivity of the results of these procedures, differential item functioning (DIF) and score equating, to potential shifts in population composition. We employ data from a large‐volume testing program for this illustrative purpose. The equating results were not affected by either inclusion or exclusion of such test takers in the analysis sample, or by shifts in population composition. The effect on DIF results, however, varied across focal groups.  相似文献   

5.
The psychometric literature is replete with comprehensive discussions of test validity, test validation, and the characteristics of quality assessment programs. The most authoritative source for guidance regarding sound test development and evaluation practices is the Standards for Educational and Psychological Testing. However, the Standards are not legally binding. In this article, we review the way in which validity is conceptualized in the Standards and compare this conceptualization with validity evidence presented in specific court cases involving legal challenges to tests. Our review indicates that, in general, there is strong congruence between the Standards and how validity is viewed in the courts, and that testing agencies that conform to these guidelines are likely to withstand legal scrutiny. However, the courts have taken a more practical, less theoretical view on validity and tend to emphasize evidence based on test content and testing consequences.  相似文献   

6.
Teacher and school accountability systems based on high-stakes tests are ubiquitous throughout the United States and appear to be growing as a catalyst for reform. As a result, educators have increased the proportion of instructional time devoted to test preparation. Although guidelines for what constitutes appropriate and inappropriate test preparation exist, they are outdated and need revision. The current article proposes new guidelines within the framework of standards-based assessment. It also examines the test preparation practices in 32 third- and fifth-grade classrooms and examines the relationship between student test performance and test preparation activities using a two-level Hierarchical Linear Model. Instruction on tested objectives using items like those presented on the state test, decontextualized practice, and teaching test taking skills offered no student achievement benefit relative to general instruction on the state standards leading us to conclude that test preparation was not beneficial.  相似文献   

7.
In 2018, 26 states administered a college admissions test to all public school juniors. Nearly half of those states proposed to use those scores as their academic achievement indicators for federal accountability under the Every Student Succeeds Act (ESSA); many others are planning to use those scores for other accountability purposes. Accountability encompasses a number of different uses and subsumes a variety of claims. For states proposing to use summative tests for accountability, a validity argument needs to be developed, which entails delineating each specific use of test scores associated with accountability, identifying appropriate evidence, and offering a rebuttal to counterclaims. The aim of this article is to support states in developing a validity argument for use of college admission test scores for accountability by identifying claims that are applicable across states, along with summarizing existing evidence as it relates to each of these claims. As outlined by The Standards for Educational and Psychological Testing, multiple sources of evidence are used to address each claim. A series of threats to the validity argument, including weaker alignment with content standards and potential influences in narrowing teaching, are reviewed. Finally, the article contrasts validity evidence, primarily from research on the ACT, with regulatory requirements from ESSA. The Standards and guidance addressing the use of a “nationally recognized high school academic assessment” (Elementary and Secondary Education Act (ESEA), Negotiated Rulemaking Committee; Department of Education) are the primary sources for the organization of validity evidence.  相似文献   

8.
Pluralism in children's reasoning about social justice   总被引:2,自引:0,他引:2  
To determine if children construe the fairness of societal practices as dependent on the implicit contract or definition of a situation, first (M = 6.8 years), third (M = 8.8 years), and fifth (M = 11.0 years) graders were questioned about 3 situations: one emphasizing learning or mastery, a contest, and a test. For each situation, they judged the fairness, alterability of fairness, effectiveness, and harmfulness of 3 teaching or coaching practices: having more able individuals help the less able, having individuals compete publicly, and having them perform independently. Children judged the fairness and effectiveness of each practice differently for each situation. They also recognized that unfair practices could become fair with participant consensus or over time, and that the potential of a practice to cause harm differed depending upon the context. These results were comparable for educational and athletic activities. In these respects, children's conceptions of the fairness of societal practices resemble those of philosophers who advocate pluralistic conceptions of justice.  相似文献   

9.
Standards‐based progress reports (SBPRs) require teachers to grade students using the performance levels reported by state tests and are an increasingly popular report card format. They may help to increase teacher familiarity with state standards, encourage teachers to exclude nonacademic factors from grades, and/or improve communication with parents. The current study examines the SBPR grade–state test score correspondence observed across 2 years in 125 third and fifth grade classrooms located in one school district to examine the degree of consistency between grades and state test results. It also examines the grading practices of a subset of 37 teachers to determine whether there is an association between teacher appraisal style and convergence rates. A moderate degree of grade–test score convergence was observed using three agreement estimates (coefficient kappa, tau‐b correlations, and classroom‐level mean differences between grades and test scores). In addition, only small amounts of grade–test score convergence were observed between teachers; a much greater proportion of variance lay within classrooms and subjects. Appraisal style correlated weakly with convergence rates, but was most strongly related to assigning students to the same performance level as the test. Therefore using recommended grading practices may improve the quality of SBPR grades to some extent.  相似文献   

10.
In the United States, racial‐ethnic differences on tests of school readiness and academic achievement continue. A complete understanding of the origins of racial‐ethnic achievement gaps is still lacking. This article describes social equity theory (SET), which proposes that racial‐ethnic achievement gaps originate from two kinds of social process, direct and signal influences, that these two kinds of processes operate across developmental contexts, and that the kind of influence and the setting in which they are enacted change with age. Evidence supporting each of SET's key propositions is discussed in the context of a critical review of research on the Black–White achievement gap. Specific developmental hypotheses derived from SET are described, along with proposed standards of evidence for testing those hypotheses.  相似文献   

11.
国家教育考试是选拔人才的主要手段。欧美等国十分重视对教育考试质量与公平的研究,形成了许多有影响力的教育考试标准,这是教育考试公平和质量的重要保障。研究选取了美国心理协会、教育研究会、全美教育测量协会联合制定的标准,美国教育考试服务中心制定的标准,欧洲测试协会制定的两个标准,欧洲国际语言测试协会制定的两个标准,共计6个国际教育考试标准。通过编码研究,发现它们共同呈现出的国际特点为:一是突出基于证据的考试测量决策与操作;二是强调对考试分数的有效解释;三是公平与质量是共识性最高的价值取向;四是考试标准不具有明显的本土化特征。除了特点以外,研究还总结了国际教育考试标准呈现出的趋势:一是考试质量与公平标准多由测试协会主导制定;二是兼顾测试公平性原则和测试行为准则;三是所有考试利益相关者承担维护的协作责任。教育考试在我国有着举足轻重的作用,这些特点与趋势为制定我国考试质量与公平标准提供了理论基础和技术支持,从而保障考试的公平与质量。  相似文献   

12.
As the number of students with disabilities applying for admission and enrolling in educational institutions continues to increase, educators and measurement experts face the challenge of determining whether and how to offer accommodations in admissions tests and how to report and utilize the results of modified tests. This article discusses the provision of accommodations in admissions testing and in educational programs, the test score flagging practices that impact admissions testing, validity concerns, and issues surrounding fairness and compliance with the federal disability laws for such practices. It offers some conclusions about the legality of the use of flagged test scores, as well as a call for further research concerning testing and evaluating students with disabilities.  相似文献   

13.
Interest in the use of large-scale achievement testing for accountability purposes and to drive instructional reform has been increasing in Canada. In the 1995 publications in Interchange, several researchers debated the merits and demerits of standardized achievement testing, including among the latter a tendency to reduce the curriculum and overemphasize routine learning (i.e., "teaching to the test"). Almost no studies have found empirical evidence for such testing's purported benefits. We set out to investigate these issues in Ontario: We present findings from a mail survey designed to find out, from Grade 9 and Grade 10 English teachers in Ontario, their perception of the quality of the Ontario Grade 9 literacy testing program and the effects it has had on the teaching and learning processes. Based on the responses of 107 teachers, our results paint a negative picture of teachers' opinions of the Grade 9 test in terms of its quality and its impact on teaching and learning. Three years after the Grade 9 test was first introduced, Grade 9 and 10 English teachers are still not convinced of its value. Our findings (and those from two other similar surveys) appear to suggest, at least based on teachers' self-reporting, that the purposes of the test — improving the quality of education and learning — as envisioned by the Ontario Ministry of Education and Training have not been met. These findings support those of other assessment impact studies in Canada, namely British Columbia and Alberta, regarding the adverse consequences of large-scale standardized testing (either multiple-choice test or performance-based assessment), and the lack of evidence for its purported positive educational influences. We recommend future research to investigate further the validity and the educational impact of the provincial tests and the reasons responsible for the observed impact or lack of it, and to determine resources, such as teacher training and materials, that are necessary to supplement the provincial testing program's effort to improve teaching and learning.  相似文献   

14.
An increasingly regulated higher education sector is renewing its attention to those activities referred to as ‘moderation’ in its efforts to ensure that judgements of student achievement are based on appropriate standards. Moderation practices conducted throughout the assessment process can result in purposes identified as equity, justification, accountability and community building. This paper draws on the limited studies of moderation and wider relevant research on judgement, standards and professional learning to test commonly used moderation practices against these identified purposes. The paper concludes with recommendations for maximising the potential of moderation practices to establish and maintain achievement standards.  相似文献   

15.
With the increase in state‐mandated high‐stakes testing across the USA, schools and school districts are considering ways of increasing instructional time for core curricular subjects such as mathematics, science, English, and social studies. One seemingly logical approach to improving test scores is to reduce the time spent in subjects that are not tested, most notably art, music, and physical education, thus increasing time for the tested subjects. In this study, data was collected from 547 Virginia elementary school principals who completed a survey indicating the time specialists taught art, music, and physical education in their schools. After controlling for socio‐cultural opportunities associated with the school community, partial correlations between time allocation and school‐level passing rates on the Virginia Standards of Learning tests indicated no meaningful relationship between time allocation to art, music, and physical education and school achievement. The findings from the study do not support the notion that a reduced time allocation to art, music, and physical education is related to higher test scores.  相似文献   

16.
The goal of the Standards for Educational and Psychological Testing is to improve testing practices, but their impact on practice appears spotty. Self-regulation clearly fails in some instances. The establishment of an external agency to oversee testing practices and adherence to the Standards would face substantial hurdles, and the ambiguity of many of the Standards would hobble such an organization if one were created. Many of the Standards are general statements of principle, and past controversies make clear that we in the field often disagree about the reasons for them, their applicability to specific cases, and their practical meaning in specific contexts. This paper argues that the field should follow two approaches to lessen this ambiguity. First, using journals, conferences, and other vehicles, we should foster more frequent and more protracted debate about the practical meaning of key Standards, such as 13.6 and 13.7, which mandate that a decision that will have a "major impact" on a student should not be based on a single score. Second, future revisions of the Standards should use concrete examples of testing practices to clarify the meaning of the Standards, much as the legal system uses case law to clarify the meaning of the general principles embodied in statutes .  相似文献   

17.
Ninety-nine NASP members participated in a study designed to investigate bias in the early stages of the referral process (i.e., in the decision to administer psychological tests). Each school psychologist received one of eight case studies, which described a child referred for academic learning problems. The case studies included typical referral information and varied student race (Black, White), socioeconomic status (higher, lower), and group achievement test scores (average, below average). The decision to administer individual psychoeducational tests was not influenced by the student's race or socioeconomic status. School psychologists were influenced by the group achievement test data. Students who showed lower achievement test results were more likely to be recommended for testing than were those who showed average performance levels. Thus, these school psychologists were not biased by knowledge of a child's race or socioeconomic status, but were influenced by instructionally relevant data (i.e., achievement test scores). In addition, when objective test data indicated average achievement levels, the psychologists did not generally recommend subsequent individual psychoeducational testing. The findings suggested that, under certain conditions, testing may not automatically follow receipt of a referral.  相似文献   

18.
This introduction to the special issue titled Alternate Assessments Based on Modified Academic Achievement Standards: New Policy, New Practices, and Persistent Challenges addresses the federal policy introducing the new alternate assessment for students with persistent academic difficulties, as well as related implementation issues that will be more thoroughly considered throughout the journal. Three guidelines are identified within the policy for alternate assessments based on modified academic achievement standards (AA-MASs), including that (a) a state's grade-level academic content standards cannot be modified for an AA-MAS, (b) a state's general test can be modified for an AA-MAS, and (c) a state's achievement standards can be modified for an AA-MAS so long as they remain on grade level. This article introduces key issues including identification of students eligible for an AA-MAS, the degree of modification that can be applied to develop an AA-MAS, and the current state of AA-MAS development across the nation. The article concludes with overviews of each contribution in the journal.  相似文献   

19.
《教育实用测度》2013,26(4):343-367
This study examined teacher testing-related attitudes and practices in a relatively unexplored educational assessment setting: court-ordered achieve- ment testing. Responses to a mail survey from teachers (representing Grades 3,4, and 5) in 17 elementary schools in a Midwestern urban district suggested that teachers engaged in a large number of test preparation practices preceding mandated Iowa Tests of Basic Skills (ITBS) testing. Most teachers reported that the results of ITBS testing did not provide benefits that offset the time and costs associated with testing. Teachers reported finding minimal value in the purpose or results from the tests. Furthermore, teachers indicated having experienced pressure from others, inside and outside their school, to improve student scores. Engagement in inappropriate testing practices was found to be substantially greater than had been previously reported in other studies of testing impact. Correlations and path analysis indicated that teacher engagement in inappropriate testing practice was unrelated to the intensity of pressure to improve test scores. Teachers who perceived less value and benefit associated with testing were more likely to engage in inappropriate practices.  相似文献   

20.

Three different tests of intelligence and the Approaches and Study Skills Inventory for Students were administered to 89 Norwegian undergraduate psychology students. The purpose was to investigate the relationship between intelligence, approaches to learning and academic achievement. Factor analysis supported a one-factor solution of the three intelligence tests as an expression of general intelligence. No relationship between general intelligence and approaches to learning was observed. The WAIS vocabulary test of intelligence and the surface approach to learning were negatively correlated. The WAIS vocabulary test of intelligence and the surface approach to learning predicted academic achievement. A curvilinear relationship between surface approach and academic achievement was observed. Multiple regression analysis showed interaction effects between deep-strategic and surface-strategic approaches to learning as predictors of academic achievement. The findings support the construct validity of approaches to learning due to its independence of intelligence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号