期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Impact of a High Stakes Teacher Evaluation System: Educator Perspectives on Accountability

Renee M. R. Moran 《Educational Studies A Journal of the American Educational Studies Association》2017,53(2):178-193

The use of student achievement data to evaluate an individual teacher's effectiveness has become a new focus in educational policy. This article focuses on the underresearched teacher perception of this new policy measure. Drawing on ethnographic research procedures, this article explores how first-grade teachers in one state navigated a new high-stakes teacher evaluation system. Although the results indicate that teachers have a desire for accountability, findings also show a variety of beliefs on the validity of teacher evaluation, as well as differing applications of scoring measures across school contexts. 相似文献

2.

The Reliability of Test Scores

《The Journal of educational research》2012,105(5):370-379

相似文献

3.

Test Stakes and Item Format Interactions

《教育实用测度》2013,26(1):55-77

The effects of test consequences, response formats (multiple choice or constructed response), gender, and ethnicity were studied for the math and science sections of a high school diploma endorsement test. There was an interaction between response format and test consequences: Under both response formats, students performed better under high stakes (diploma endorsement) than under low stakes (pilot test), but the difference was larger for the constructed response items. Gender and ethnicity did not interact with test stakes; the means of all groups increased when the test had high stakes. Gender interacted with format; boys scored higher than girls on multiple-choice items, girls scored higher than boys on constructed-response items. 相似文献

4.

The Effect of Item Response Changes on Scores on an Elementary Reading Achievement Test

《The Journal of educational research》2012,105(3):153-156

Abstract

The effect of changing item responses on scores of elementary school children on a standardized achievement test was studied. Previous research, primarily involving non-standardized instruments and adult samples, indicates that changed responses are more likely to be correct than not. Subjects were 165 third grade students using the Metropolitan Reading Tests. Students received no special instructions regarding changing responses. Changes were identified visually and were independently verified. While frequency of response changes was low, such changes generally improved scores. Sex differences in number and success of changes were non-significant. The relationship between frequency of response change and test score was minimal. Responses to difficult items were changed more frequently with less success than changes on easy items. High scorers made more successful changes than did low scorers. Within the limits of the methodology, results clearly indicated that response changes of elementary students on multiple-choice items tend to improve test scores. 相似文献

5.

A Validity Argument in Support of the Use of College Admissions Test Scores for Federal Accountability

Wayne J. Camara Krista Mattern Michelle Croft Sara Vispoel Paul Nichols 《Educational Measurement》2019,38(4):12-26

In 2018, 26 states administered a college admissions test to all public school juniors. Nearly half of those states proposed to use those scores as their academic achievement indicators for federal accountability under the Every Student Succeeds Act (ESSA); many others are planning to use those scores for other accountability purposes. Accountability encompasses a number of different uses and subsumes a variety of claims. For states proposing to use summative tests for accountability, a validity argument needs to be developed, which entails delineating each specific use of test scores associated with accountability, identifying appropriate evidence, and offering a rebuttal to counterclaims. The aim of this article is to support states in developing a validity argument for use of college admission test scores for accountability by identifying claims that are applicable across states, along with summarizing existing evidence as it relates to each of these claims. As outlined by The Standards for Educational and Psychological Testing, multiple sources of evidence are used to address each claim. A series of threats to the validity argument, including weaker alignment with content standards and potential influences in narrowing teaching, are reviewed. Finally, the article contrasts validity evidence, primarily from research on the ACT, with regulatory requirements from ESSA. The Standards and guidance addressing the use of a “nationally recognized high school academic assessment” (Elementary and Secondary Education Act (ESEA), Negotiated Rulemaking Committee; Department of Education) are the primary sources for the organization of validity evidence. 相似文献

6.

Pass Rates in the First Year of University Study: The Effect of Gender and Faculty

Stephen P Keef 《高等教育研究与发展》1992,11(1):39-44

In recent years, the Faculty of Commerce and Administration at Victoria University of Wellington has grown at a faster rate than the rest of the University. The increase in the number of female students has been even more pronounced. The study sought to determine the degree to which pass rates in first year of full‐time studies differed between the sexes and between the Commerce faculty and the rest of the University. Students were grouped into three categories of prior academic ability. The results showed that females with the two lower levels of prior academic ability in the rest of the University achieved significantly higher pass rates than their peers. 相似文献

7.

The Effect of Four Sets of Test Instructions on Scores in Mental Ability Tests

Gösta W. Berglund 《Scandinavian Journal of Educational Research》2013,57(1):31-38

Berglund, G. W. (1970). The Effect of four Sets of Test Instructions on Scores in Mental Ability Tests. Scand. J. Educ. Res. 14, 31‐38. Four hundred and eighteen Swedish children (11‐year‐olds) were divided randomly into four experimental groups. Three mental ability tests of the factor type were administered to the groups by means of four different sets of instructions. In the first group the tests were presented as intelligence tests and in the second group as achievement tests. The third group received the original instructions of the tests and the fourth group received routine instructions. It is concluded (a) that the four instructions do not differentiate the groups in power tests, and (b) that the routine instruction does not affect the subjects’ working speed to the same degree as the other instructions. 相似文献

8.

The Differential Impact of Curriculum on Aptitude Test Scores

William H. Angoff Eugene G. Johnson 《Journal of Educational Measurement》1990,27(4):291-305

A sample of 22, 923 students who had taken the SAT and the GRE General Test was classified by the four general undergraduate fields of study and by sex. The authors performed several analyses to determine the degree of differential impact that sex and field of study might have on GRE-Verbal, GRE-Quantitative, and GRE-Analytical scores after controlling on SAT-Verbal and SAT-Mathematical scores. They found, first, that the correlations of SAT-Verbal with GRE-Verbal scores and SAT-Mathematical with GRE-Quantitative scores were extremely high, .86 in the total sample and ranging from the low to middle .80s in the eight subgroups. The impact of curriculum and sex, after controlling on SAT scores, was found to be low on GRE- Verbal scores but relatively high on GRE-Quantitative scores, with students in heavily quantitative fields enjoying an advantage over their peers in less quantitative fields of study. The impact was moderate on GRE-Analytical scores. Further studies designed to "purify" the fields of study and include only clearly verbal fields and clearly mathematical fields showed small additional impact. An additional study indicated a generally slight effect of the institution attended on GRE-Quantitative scores after controlling for sex, major field of study, and initial ability. 相似文献

9.

Relationship between Test Scores and Test Time

Carmelo Terranova 《Journal of Experimental Education》2013,81(3):81-83

Randomly selected fifth, seventh, ninth, and eleventh graders (sixty from each grade) were givenanability test. The score and the time taken were used to test the hypotheses of no negative linear relationship and no curvilinear relationship between test score and test time. Although no significant linear relationships were found, significant curvilinear regressions of time on score were found in grades seven and nine. The strength of these significant relationships were relatively low in both grades. 相似文献

10.

New Times,New Stakes: Moments of Transit,Accountability, and Classroom Practice

Robert J. Helfenbein 《Review of Education, Pedagogy & Cultural Studies》2013,35(2-3):91-109

相似文献

11.

Variability in Reading Scores on a Given Level of Intelligence Test Scores

《The Journal of educational research》2012,105(6):440-446

ABSTRACT

Previous studies have shown that several key variables influence student achievement in geometry, but no research has been conducted to determine how these variables interact. A model of achievement in geometry was tested on a sample of 102 high school students. Structural equation modeling was used to test hypothesized relationships among variables linked to successful problem solving in geometry. These variables, including motivation, achievement emotions, pictorial representation, and categorization skills, were examined for their influence on geometry achievement. Results indicated that the model fit well. Achievement emotions, specifically boredom and enjoyment, had a significant influence on student motivation. Student motivation influenced students’ use of pictorial representations and achievement. Pictorial representation also directly influenced achievement. Categorization skills had a significant influence on pictorial representations and student achievement. The implications of these findings for geometry instruction and for future research are discussed. 相似文献

12.

The Influence of Sex,Education and Age on Test Scores on the Swedish Scholastic Aptitude Test

Kenny Bränberg Widar Henriksson Hans Nyquist Ingemar Wedman 《Scandinavian Journal of Educational Research》2013,57(3):189-203

This study describes the effects of sex, education and age on the total test score on the Swedish Scholastic Aptitude Test (SweSA T), a test used in the selection process to colleges and universities in Sweden since 1977. Its use has so far been limited to one of four quota groups consisting of applicants 25 years or older and with more than four years of work experience. Statistical methods used in this study are regression models with dummy variables and estimated with a corner‐point parameterization. The results indicate rather genuine differences in every variable studied. Test takers with a higher education obtain a higher mean score than those with a lower education and older test takers obtain a higher mean score on the subtests vocabulary (WORD) and general information (GI) than younger persons. The mean test score for men is higher than the corresponding score for women, even if differences in education and age are controlled for. Finally some statistical problems related to the analysis of data of this type are discussed. 相似文献

13.

Graduation Rates and Accountability: Regressions Versus Production Frontiers

Robert B. Archibald David H. Feldman 《Research in higher education》2008,49(1):80-100

This paper suggests an alternative to the standard practice of measuring the graduation rate performance using regression analysis. The alternative is production frontier analysis. Production frontier analysis is appealing because it compares an institution’s graduation rate to the best performance instead of the average performance. The paper explains the differences between these two types of analysis and provides examples of their application using data for 187 national universities.

David H. FeldmanEmail:

相似文献

14.

非标准化试题的智能评分

陈慈弟林远明黄聪田民格《三明学院学报》2011,28(2):7-10

通过对函数S-粗集和动态规划算法的研究,提出了相似度和可信度概念,给出了非标准化试题实现评分的方案和步骤,其中关键步骤是迁移处理和计算最长公共子序列长度。主要阐述了基于函数S-粗集的迁移处理,并分析了计算最长公共子序列长度解的结构和计算方法,最后分别给出了迁移函数和计算最长公共子序列长度函数的源程序。相似文献

15.

The Role of Extended Time and Item Content on a High‐Stakes Mathematics Test

Allan S. Cohen Noel Gregg Meng Deng 《Learning disabilities research & practice》2005,20(4):225-233

The premise of a great deal of current research guiding policy development has been that accommodations are the catalyst for student performance differences. Rather than accepting this premise, two studies were conducted to investigate the influence of extended time and content knowledge on the performance of ninth‐grade students who took a statewide mathematics test with and without accommodations. Each study involved 1,250 accommodated students (extended time only) with learning disabilities and 1,250 nonaccommodated students demonstrating no disabilities. In Study One, a standard differential item functioning (DIF) analysis illustrated that the usual approach to studying the effects of accommodations contributes little to our understanding of the reason for performance differences across students. Next, a mixture item response theory DIF model was used to explore the most likely cause(s) for performance differences across the population. The results from both studies suggest that students for whom items were functioning differently were not accurately characterized by their accommodation status but rather by their content knowledge. That is, knowing students' accommodation status (i.e., accommodated or nonaccommodated) contributed little to understanding why accommodated and nonaccommodated students differed in their test performance. Rather, the data would suggest that a more likely explanation is that mathematics competency differentiated the groups of student learners regardless of their accommodation and/or reading levels. 相似文献

16.

Effects of Empirical Option Weighting on Estimating Domain Scores and Making Pass/Fail Decisions

《教育实用测度》2013,26(3):231-244

For any testing program intended for licensure, certification, competency, or proficiency, the estimation of content relevant test scores for pass/fail decision making is necessary. This study compares number-correct scoring to empirical option weighting in the context of such tests. The study was conducted under two test design conditions, three test length conditions, and four passing score levels. Two criteria were used to evaluate the effectiveness of empirical option weighting versus number-correct scoring. Empirical option weighting typically produced slightly more reliable domain score estimates and more consistent pass/fail decisions than number-correct scoring, particularly in the lower half of the test score distribution. For many types of testing programs where the passing scores are established in the lower half of the test score distribution, the empirical option weighting method used in this study seems both appropriate and effective in improving the depend- ability of test scores and the consistency of pass/fail decisions. Test users, however, must weigh the effort required to use option weighting against the small gains obtained with this method. Other problems are discussed that may limit the usefulness of option weighting. 相似文献

17.

Uses and Abuses of Achievement Test Scores

Susan Bobbitt Nolen Thomas M. Haladyna Nancy S. Haas 《Educational Measurement》1992,11(2):9-15

Are variations in test-preparation practices from school to school undermining the meaningfulness of achievement test results? Is there pressure to raise achievement test scores by the use of educationally unsound practices? What uses of achievement test scores are most common? Do teachers and administrators have reasonably accurate views of test score uses? 相似文献

18.

Early Literacy Practices as Predictors of Reading Related Outcomes: Test Scores,Test Passing Rates,Retention, and Special Education Referral

《Exceptionality》2013,21(1):11-28

Recent focus on the reading skills of primary school children has led to the increase of funding for early literacy programs targeting students at risk for reading failure. In this study, self-reports of the frequency of currently advocated early literacy practices in Grades 1 through 3 were entered into regression models in an effort to predict mean language arts scores and passing rates on a 3rd-grade state examination, grade level retention, and referral for special education assessment. Regression models were also compared to models predicting rates of special education referral and retention. Findings indicate that the effect of early literacy practice on school-level outcomes depends on the measure used as an indicator of improvement. Explicit skill instruction was a significant predictor of higher passing rates on a state examination, as well as lower rates of referrals for special education assessment. However, explicit skills instruction was also a predictor of higher rates of grade retention. Holistic focus was associated with higher rates of referral for special education assessment, as well as lower retention rates. Programs that included a parent-child reading feature were associated with both lower rates of referral and grade retention. Findings are discussed in light of research in classroom environments and school reform. 相似文献

19.

以考试成绩分析为案例诊断与改进教学质量

《教育教学论坛》2019,(44)

考试是教学质量监控中十分重要的环节,是实现学校教学评价和教育教学目标的一种重要手段。考试不仅是对学生学习效果的检测,也是对教师教学质量的检查,还是对学校管理的监控,通过对学生考试成绩的统计分析,可以帮助教师及时发现教学中存在的问题及薄弱环节,及时调整教学内容,改进教学方法;可以对教学管理制度、教学运行体制进行有效得当的检验、监控和纠偏的作用。相似文献

20.

Indicators of Usefulness of Test Scores

Richard Sawyer 《教育实用测度》2013,26(3):255-271

Current thinking on validity suggests that educational institutions and individuals should evaluate their uses of test scores in the context of their fundamental goals. Regression coefficients and other traditional criterion-related validity statistics provide relevant information, but often do not, by themselves, address the fundamental reasons for using test scores. Formal decision theory models provide a logically rigorous way to do this, but they are difficult to implement in practice. This article considers a simplification of formal decision theory models, in which one estimates the proportion of examinees for whom positive outcomes result from a use of test scores. For uses involving selection, the proportion of examinees with positive outcomes can be calculated by applying traditional regression coefficients to the marginal distribution of scores in the unselected population. The incremental usefulness of using a particular variable can be judged by comparing its proportion to that associated with no selection and to that associated with using another variable, either alone or jointly. Examples, related to college admission and retention, are given to illustrate these ideas. 相似文献