期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Judgment‐Based Scoring by Teachers as Professional Development: Distinguishing Promises from Proof

Gail Lynn Goldberg 《Educational Measurement》2012,31(3):38-47

The engagement of teachers as raters to score constructed response items on assessments of student learning is widely claimed to be a valuable vehicle for professional development. This paper examines the evidence behind those claims from several sources, including research and reports over the past two decades, information from a dozen state educational agencies regarding past and ongoing involvement of teachers in scoring‐related activities as of 2001, and interviews with educators who served a decade or more ago for one state's innovative performance assessment program. That evidence reveals that the impact of scoring experience on teachers is more provisional and nuanced than has been suggested. The author identifies possible issues and implications associated with attempts to distill meaningful skills and knowledge from hand‐scoring training and practice, along with other forms of teacher involvement in assessment development and implementation. The paper concludes with a series of research questions that—based on current and proposed practice for the coming decade—seem to the author to require the most immediate attention. 相似文献

2.

Evaluating the Comparability of Paper‐ and Computer‐Based Science Tests Across Sex and SES Subgroups

Jennifer Randall Stephen Sireci Xueming Li Leah Kaira 《Educational Measurement》2012,31(4):2-12

As access and reliance on technology continue to increase, so does the use of computerized testing for admissions, licensure/certification, and accountability exams. Nonetheless, full computer‐based test (CBT) implementation can be difficult due to limited resources. As a result, some testing programs offer both CBT and paper‐based test (PBT) administration formats. In such situations, evidence that scores obtained from different formats are comparable must be gathered. In this study, we illustrate how contemporary statistical methods can be used to provide evidence regarding the comparability of CBT and PBT scores at the total test score and item levels. Specifically, we looked at the invariance of test structure and item functioning across test administration mode across subgroups of students defined by SES and sex. Multiple replications of both confirmatory factor analysis and Rasch differential item functioning analyses were used to assess invariance at the factorial and item levels. Results revealed a unidimensional construct with moderate statistical support for strong factorial‐level invariance across SES subgroups, and moderate support of invariance across sex. Issues involved in applying these analyses to future evaluations of the comparability of scores from different versions of a test are discussed. 相似文献

3.

Development and Validation of a Multimedia-based Assessment of Scientific Inquiry Abilities

Che-Yu Kuo Tsung-Hau Jen Ying-Shao Hsu 《International Journal of Science Education》2013,35(14):2326-2357

The potential of computer-based assessments for capturing complex learning outcomes has been discussed; however, relatively little is understood about how to leverage such potential for summative and accountability purposes. The aim of this study is to develop and validate a multimedia-based assessment of scientific inquiry abilities (MASIA) to cover a more comprehensive construct of inquiry abilities and target secondary school students in different grades while this potential is leveraged. We implemented five steps derived from the construct modeling approach to design MASIA. During the implementation, multiple sources of evidence were collected in the steps of pilot testing and Rasch modeling to support the validity of MASIA. Particularly, through the participation of 1,066 8th and 11th graders, MASIA showed satisfactory psychometric properties to discriminate students with different levels of inquiry abilities in 101 items in 29 tasks when Rasch models were applied. Additionally, the Wright map indicated that MASIA offered accurate information about students’ inquiry abilities because of the comparability of the distributions of student abilities and item difficulties. The analysis results also suggested that MASIA offered precise measures of inquiry abilities when the components (questioning, experimenting, analyzing, and explaining) were regarded as a coherent construct. Finally, the increased mean difficulty thresholds of item responses along with three performance levels across all sub-abilities supported the alignment between our scoring rubrics and our inquiry framework. Together with other sources of validity in the pilot testing, the results offered evidence to support the validity of MASIA. 相似文献

4.

Science assessments for all: Integrating science simulations into balanced state science assessment systems

Edys S. Quellmalz Michael J. Timms Matt D. Silberglitt Barbara C. Buckley 《科学教学研究杂志》2012,49(3):363-393

This article reports on the collaboration of six states to study how simulation‐based science assessments can become transformative components of multi‐level, balanced state science assessment systems. The project studied the psychometric quality, feasibility, and utility of simulation‐based science assessments designed to serve formative purposes during a unit and to provide summative evidence of end‐of‐unit proficiencies. The frameworks of evidence‐centered assessment design and model‐based learning shaped the specifications for the assessments. The simulations provided the three most common forms of accommodations in state testing programs: audio recording of text, screen magnification, and support for extended time. The SimScientists program at WestEd developed simulation‐based, curriculum‐embedded, and unit benchmark assessments for two middle school topics, Ecosystems and Force & Motion. These were field‐tested in three states. Data included student characteristics, responses to the assessments, cognitive labs, classroom observations, and teacher surveys and interviews. UCLA CRESST conducted an evaluation of the implementation. Feasibility and utility were examined in classroom observations, teacher surveys and interviews, and by the six‐state Design Panel. Technical quality data included AAAS reviews of the items' alignment with standards and quality of the science, cognitive labs, and assessment data. Student data were analyzed using multidimensional Item Response Theory (IRT) methods. IRT analyses demonstrated the high psychometric quality (reliability and validity) of the assessments and their discrimination between content knowledge and inquiry practices. Students performed better on the interactive, simulation‐based assessments than on the static, conventional items in the posttest. Importantly, gaps between performance of the general population and English language learners and students with disabilities were considerably smaller on the simulation‐based assessments than on the posttests. The Design Panel participated in development of two models for integrating science simulations into a balanced state science assessment system. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 363–393, 2012 相似文献

5.

The Comparability of Scores from Different Digital Devices: A Literature Review and Synthesis with Recommendations for Practice

Nathan Dadey Susan Lyons Charles DePascale 《教育实用测度》2018,31(1):30-50

Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the comparability of scores across devices, and as such provide a starting point in designing and implementing a research agenda to support the comparability of any assessment program. This work starts with a conceptual framework rooted in the idea of a comparability claim—a conceptual statement about how each student is expected to perform on each of the devices in question. Then a review of the available literature is provided, focusing on how aspects of the devices (touch screens, keyboards, screen size, and displayed content) and aspects of the assessments (content area and item type) relate to student performance and preference. Building on this literature, recommendations to minimize threats to comparability are then provided. The article concludes with ways to gather evidence to support claims of comparability. 相似文献

6.

Evaluating the Predictive Value of Growth Prediction Models

Daniel L. Murphy Matthew N. Gaertner 《Educational Measurement》2014,33(2):5-13

This study evaluates four growth prediction models—projection, student growth percentile, trajectory, and transition table—commonly used to forecast (and give schools credit for) middle school students' future proficiency. Analyses focused on vertically scaled summative mathematics assessments, and two performance standards conditions (high rigor and low rigor) were examined. Results suggest that, when “status plus growth” is the accountability metric a state uses to reward or sanction schools, growth prediction models offer value above and beyond status‐only accountability systems in most, but not all, circumstances. Predictive growth models offer little value beyond status‐only systems if the future target proficiency cut score is rigorous. Conversely, certain models (e.g., projection) provide substantial additional value when the future target cut score is relatively low. In general, growth prediction models' predictive value is limited by a lack of power to detect students who are truly on‐track. Limitations and policy implications are discussed, including the utility of growth projection models in assessment and accountability systems organized around ambitious college‐readiness goals. 相似文献

7.

No child,no school,no state left behind: schooling in the age of accountability 1

Stefan Thomas Hopmann 《课程研究杂志》2013,45(4):417-456

Why and under which conditions do international student assessment programmes like PISA have success? How can the results of these assessments be useful for advocates of different, even contradictory, policies? What might explain different patterns of using assessment as a tool for school governance? Drawing on historical and comparative research, and using PISA as an example, this paper provides a frame for discussing these and other questions around the international rise of accountability as a key tool of social change. The basic argument is that even though accountability is a global phenomenon, the ways and means of enacting and encountering accountability are not. How accountability is experienced depends on deeply engrained ‘constitutional mind‐sets’, i.e. diverse cultures of conceptualizing the relation between the public and its institutions. 相似文献

8.

An Investigation of Scoring Methods for Mathematics Performance-Based Assessments

《Educational Assessment》2013,18(3):195-224

Three mathematics scoring methods are being used or explored in large-scale assessment programs: item-by-item scoring, holistic scoring, and "trait" scoring. This study investigated all 3 methods of scoring on 3 mathematics performance-based assessments. Mathematics assessment tasks were selected from a pool of pilot tasks because they could be scored using all 3 methods. Results of the study suggest that holistic scoring and item-by-item scoring methods provide similar information; however, trait score for conceptual understanding and mathematics communication tapped into different aspects of student performance. Implications for the validity of scoring methods now in use for performance-based mathematics assessments are discussed. 相似文献

9.

试论美国基础教育学生评价中的高风险测验

王凯赵丽《比较教育研究》2004,25(7):63-67

在美国的基础教育领域,不管是州级、学区级还是校级的学生评价,都突现出两个特点:测量运动的思想依旧占据很重要的地位;评价与绩效责任紧紧联系在一起.测量运动的余波与绩效责任之风在学生评价领域掀起了新的风浪.这其中,最主要的就是高风险测验的盛行. 相似文献

10.

Scaling Up,Scaling Down: Seven Years of Performance Assessment Development in the Nation's Second Largest School District

David Niemi Eva L. Baker Roxanne M. Sylvester 《Educational Assessment》2013,18(3-4):195-214

To provide an accurate reading of students' and schools' rates of progress, and to provide cues for instruction, assessment at every level should be connected to explicit learning goals and standards. To show how this requirement can be fulfilled, and how research-based assessment can effectively support learning and instruction, this article summarizes a 7-year performance assessment collaboration between assessment researchers and the nation's second largest school district. The project's success in scaling up empirically tested assessment design models and scoring procedures to a district assessment involving more than 300,000 students per year raises the possibility that high-quality learning-centered assessment may again be a practical option for large-scale assessment and accountability. 相似文献

11.

The Effect of Ignoring Classroom‐Level Variance in Estimating the Generalizability of School Mean Scores

Xin Wei Edward Haertel 《Educational Measurement》2011,30(1):13-22

Contemporary educational accountability systems, including state‐level systems prescribed under No Child Left Behind as well as those envisioned under the “Race to the Top” comprehensive assessment competition, rely on school‐level summaries of student test scores. The precision of these score summaries is almost always evaluated using models that ignore the classroom‐level clustering of students within schools. This paper reports balanced and unbalanced generalizability analyses investigating the consequences of ignoring variation at the level of classrooms within schools when analyzing the reliability of such school‐level accountability measures. Results show that the reliability of school means cannot be determined accurately when classroom‐level effects are ignored. Failure to take between‐classroom variance into account biases generalizability (G) coefficient estimates downward and standard errors (SEs) upward if classroom‐level effects are regarded as fixed, and biases G‐coefficient estimates upward and SEs downward if they are regarded as random. These biases become more severe as the difference between the school‐level intraclass correlation (ICC) and the class‐level ICC increases. School‐accountability systems should be designed so that classroom (or teacher) level variation can be taken into consideration when quantifying the precision of school rankings, and statistical models for school mean score reliability should incorporate this information. 相似文献

12.

E‐assessment within the Bologna paradigm: evidence from Portugal

Maria Ferrão 《Assessment & Evaluation in Higher Education》2010,35(7):819-830

The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non‐mathematics core curricula courses. This research project, Mathematics and Statistics for the Development of Professional Skills, which is in progress at the University of Beira Interior in Portugal, has as one of its main objectives the development of the e‐assessment system as a resource for learning assessment and student self‐regulation. Based on the results of the above‐mentioned project, this paper will give evidence of how to improve the reliability of an e‐assessment system, show that e‐assessment can be a good alternative to open‐ended tests and that students tend to show a positive attitude towards its use. We show that this can be done by checking the internal consistency and measurement error of the e‐assessment tests, the analysis of the association between student scores obtained by different methods of assessment and the analysis of data survey on student opinion about e‐assessment methods. 相似文献

13.

Building and Supporting a Validity Argument for a Standards‐Based Classroom Assessment of English Proficiency Based on Teacher Judgments

Lorena Llosa 《Educational Measurement》2008,27(3):32-42

Using an argument‐based approach to validation, this study examines the quality of teacher judgments in the context of a standards‐based classroom assessment of English proficiency. Using Bachman's (2005) assessment use argument (AUA) as a framework for the investigation, this paper first articulates the claims, warrants, rebuttals, and backing needed to justify the link between teachers' scores on the English Language Development (ELD) Classroom Assessment and the interpretations made about students' language ability. Then the paper summarizes the findings of two studies—one quantitative and one qualitative—conducted to gather the necessary backing to support the warrants and, in particular, address the rebuttals about teacher judgments in the argument. The quantitative study examined the assessment in relation to another measure of the same ability—the California English Language Development Test—using confirmatory factor analysis of multitrait‐multimethod data and provided evidence in support of the warrant that states that the ELD Classroom Assessment measures English proficiency as defined by the California ELD Standards. The qualitative study examined the processes teachers engaged in while scoring the classroom assessment using verbal protocol analysis. The findings of this study serve to support the rebuttals in the validity argument that state that there are inconsistencies in teachers' scoring. The paper concludes by providing an explanation for these seemingly contradictory findings using the AUA as a framework and discusses the implications of the findings for the use of standards‐based classroom assessments based on teacher judgments. 相似文献

14.

Ten Years After the Spellings Commission: From Accountability to Internal Improvement

下载免费PDF全文

Ou Lydia Liu 《Educational Measurement》2017,36(2):34-41

Student learning outcomes assessment has been increasingly used in U.S. higher education institutions over the last 10 years, partly fueled by the recommendation from the Spellings Commission that institutions need to demonstrate more direct evidence of student learning. To respond to the Commission's call, various accountability initiatives have been launched, profoundly reshaping how assessment has been viewed, implemented, and used in higher education. This article reviews the conceptual and methodological challenges of the assessment agenda for one of the landmark accountability initiatives, the Voluntary System of Accountability, and also documents the notable shift from a strong focus on accountability to an increasing emphasis on internal improvement. This article then discusses the most recent developments in assessment approaches and tools, and proposes a four‐element, one‐enabler assessment cycle for institutions to maximally benefit from their assessment efforts. 相似文献

15.

Contesting educational assessment policies in Australia

J. Joy Cumming Fabienne M. Van Der Kleij Lenore Adie 《教育政策杂志》2019,34(6):836-857

ABSTRACT

Assessment is a major component of education, significant in directing what is identified as valued student learning. This paper is framed within an understanding of imperative and exhortative policy. Two paradigmatically different, and potentially contesting, assessment policy directions in Australian education – educational accountability to monitor school and teacher performance, and teacher assessment practices to improve learning (assessment for learning [AfL] or formative assessment) – are examined for their impact on teacher professionalism. Both approaches have official endorsement in Australian policy. Mandated participation in national tests is indicative of educational accountability assessments under national direction. While also endorsed nationally, AfL implementation is reliant on state and territory direction. Our examination reveals tensions in the alignment of both policies. This is evident in the impact of accountability assessment on AfL implementation, in particular, teachers’ understandings of valued assessment evidence. We conclude that a paradigmatic shift to support student learning in Australian schools is a policy imperative that includes the need for professional development and learning support for teachers. 相似文献

16.

Teachers' conceptions of assessment: implications for policy and professional development

Gavin T. L. Brown 《Assessment in Education: Principles, Policy & Practice》2004,11(3):301-318

Teachers' conceptions of assessment can be understood in terms of their agreement or disagreement with four purposes to which assessment may be put, specifically, (a) improvement of teaching and learning, (b) school accountability, (c) student accountability, or (d) treating assessment as irrelevant. A 50‐item Teachers' Conceptions of Assessment (COA‐III) questionnaire was completed by New Zealand primary school teachers and managers (n=525). The COA‐III, based on the four main purpose‐defined conceptions of assessment, was analysed with structural equation modelling and showed a close fit of the data to a hierarchical, multi‐dimensional model (χ²=3217.68; df=1162; RMSEA=.058; TLI=.967). On average, participants agreed with the improvement conceptions and the school accountability conception, while rejecting the view that assessment was irrelevant. However, respondents disagreed that assessment was for student accountability. Improvement, school, and student accountability conceptions were positively correlated. The irrelevance conception was inversely related to the improvement conception and not related to the system accountability conception. Surprisingly, no statistically significant differences were found in mean scale scores for each conception regardless of teacher (age, gender, role, assessment training, or assessment practices) or school (size, location, or socio‐economic status) variables. Implications for the use of the COA‐III for policy implementation and teacher professional development are discussed. 相似文献

17.

The Nature of the Gift: Accountability and the Professor‐Student Relationship

《Educational Philosophy and Theory》2013,45(6):574-591

In this paper I introduce the theory of gift giving as a possible means to reconcile the contradictions inherent in accountability measures of ‘faculty productivity’ in the American university. In this paper I sketch the theory of gift economies to show how, given the historical ideals that characterize the faculty‐student relationship, a theory of gift giving could help us better judge the labor of the faculty. I suggest that it is the relational character of teaching that frustrates accountability measures and that perhaps if viewed as a gift economy—and in particular an economy with ‘reproductive’ ends—we could better grasp the effectiveness of these relationships. 相似文献

18.

Assessing ‘inexperienced’ students' ability to self‐assess: exploring links with learning style and academic personal control

Simon Cassidy 《Assessment & Evaluation in Higher Education》2007,32(3):313-330

The study sought to establish the level of students' self‐assessment skill—particularly inexperienced students—and to examine the relationship between self‐assessment skill and learning style, student perceptions of academic locus of control and academic self‐efficacy. Students were asked to evaluate and provide estimated marks for their own work, were which compared with tutors' actual marks. Students also completed measures of learning style, academic locus control and academic self‐efficacy. Comparisons of student estimated and tutor marks indicated a good level of self‐assessment skill in the majority of students. A significant minority of students did however fail to exhibit such skills. There was also some evidence of a tendency for students to underestimate their performance. While both strategic and deep approaches to learning were shown to be positively correlated with tutor mark, only surface approach was negatively correlated with students' estimated mark, suggesting that surface learners are inclined to provide lower evaluations of their own performance. Deep approach was also correlated with accuracy of student self‐assessment skill, suggesting that deep learning is associated with self‐assessment competency. No clear or convincing associations between self‐assessment skill and perceptions of academic locus of control or academic self‐efficacy were identified. Findings suggest that while self‐assessment skill undoubtedly develops, becoming more effective during students' academic career, inexperienced students do have the capacity for self‐evaluation and should therefore be included in self‐assessment activities. In the light of findings related to learning style and the heterogeneous nature of student groups, student monitoring and skill development are proposed in order to allow the integration of self‐assessment into the learning and assessment process. 相似文献

19.

Experiences of beginning teachers in a school‐based mentoring program in Sweden

Ulla Lindgren 《Educational studies》2005,31(3):251-263

Even though teacher education has been successful in preparing students for their future profession, the classroom reality can differ greatly from the inservice training. Many novice teachers therefore find the transition from student teacher to inservice teacher overwhelming To support beginning teachers, mentoring programs—where more experienced teachers support novice teachers—have become commonplace in many schools worldwide. In Sweden, mentoring for beginning teachers has been a frequent feature of support since 2001. This study, conducted in Sweden, examines seven novice teachers and the impact the mentoring process had upon them during their first‐year teaching. Based on interviews, it was found that these experienced both professional and personal support from their mentors. The study also showed the significance of observant leaders within the mentorship program following up on the development of the mentor–mentee relationship. 相似文献

20.

Formative computer‐based assessment in higher education: the effectiveness of feedback in supporting student learning

Tess Miller 《Assessment & Evaluation in Higher Education》2009,34(2):181-192

A formative computer‐based assessment (CBA) was one of three instruments used for assessment in a Bachelor of Education course at Queen’s University (Ontario, Canada) with an enrolment of approximately 700 students. The formative framework fostered a self‐regulated learning environment whereby feedback on the CBA was used to support rather than measure student learning. The four types of feedback embedded in the CBA included: (a) directing students to a resource, (b) rephrasing a question, (c) providing additional information and (d) providing the correct answer. Although students originally reported positive experiences with the formative CBA, two follow‐up surveys revealed that they found the four types of feedback to be moderately effective in supporting their learning. 相似文献