期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Impact of Measurement Error on the Accuracy of Individual and Aggregate SGP

Daniel F. McCaffrey Katherine E. Castellano J. R. Lockwood 《Educational Measurement》2015,34(1):15-21

Student growth percentiles (SGPs) express students' current observed scores as percentile ranks in the distribution of scores among students with the same prior‐year scores. A common concern about SGPs at the student level, and mean or median SGPs (MGPs) at the aggregate level, is potential bias due to test measurement error (ME). Shang, vanIwaarden, and Betebenner (SVB; this issue) develop a simulation‐extrapolation (SIMEX) approach to adjust SGPs for test ME. In this paper, we use a tractable example in which different SGP estimators, including SVB's SIMEX estimator, can be computed analytically to explain why ME is detrimental to both student‐level and aggregate‐level SGP estimation. A comparison of the alternative SGP estimators to the standard approach demonstrates the common bias‐variance tradeoff problem: estimators that decrease the bias relative to the standard SGP estimator increase variance, and vice versa. Even the most accurate estimator for individual student SGP has large errors of roughly 19 percentile points on average for realistic settings. Those estimators that reduce bias may suffice at the aggregate level but no single estimator is optimal for meeting the dual goals of student‐ and aggregate level inferences. 相似文献

2.

Shoe Shopping and the Reliability Coefficient

《Educational Assessment》2013,18(4):255-258

Editor's Introduction. Reliability Versus Accuracy: A Critical Distinction Test reliability coefficients traditionally have been used to judge the quality of measurement. And, reliability coefficients of .90 have often been considered adequate to assure the quality for standardized testing and large-scale assessment programs. However, a test reliability of .90 (or above) does not ensure that individual test scores, such as national percentile ranks, are accurate. Consider, for example, a mathematics test with a reliability of .90 and imagine a student taking that test whose true score is at the 50th percentile; that is, we know that the student's actual capability is at that level. The probability is less than one third (.309) that when the student takes the test, he or she will obtain a score within 5 percentile points of his or her true score, the 50th percentile (Rogosa 1999a, 1999b). The following informal example attempts to explain why high test reliability does not indicate good accuracy for an individual score, without the encumbrances of percentile rank scoring, complex measurement models, and other technical detail. Dedicated to Al Bundy-A man who cares as much about good measurement as he does about his own children. 相似文献

3.

Measurement Error Adjustment Using the SIMEX Method: An Application to Student Growth Percentiles

Yi Shang 《Journal of Educational Measurement》2012,49(4):446-465

相似文献

4.

Median Growth Percentiles (MGPs): Assessment of Intertemporal Stability and Correlations with Observational Scores

Margarita Pivovarova Audrey Amrein-Beardsley 《Educational Assessment》2018,23(2):139-155

While states are no longer required to set up teacher evaluation systems based in significant part on student test scores, quite a few continue to use value-added (VAMs) or student growth percentile (SGP) models for that purpose. In this study, we analyzed three years of teacher data to illustrate the performance of teachers’ median growth percentiles (MGPs)). We found MGP’s consistency over time to be comparable with the existing estimates from the value-added models (VAMs). Additionally, we found that MGPs do not substantively agree with another measure of teacher quality – teachers’ observational scores. These findings suggest that caution should be exercised when teacher’s MGPs, as well as VAMs, are used in teacher evaluation system to make high-stakes decisions such as merit pay, tenure, or teacher contract termination. Our findings about the correlation of MGPs with observational scores support the idea of the multidimensional nature of teacher effectiveness construct. 相似文献

5.

Relationship between measures of academic quality and undergraduate student attrition: the case of higher education institutions in the Colombian Caribbean region

Anabella Martínez Mónica Borjas Mariela Herrera Jorge Valencia 《高等教育研究与发展》2015,34(6):1192-1206

Undergraduate student attrition is a major concern in higher education. It is usually explained by the impact of student attributes; however, recent developments in student success literature point to the need of exploring institutional practices that may impact a student's decision to abandon their studies. The current weight of academic quality assurance for Colombian higher education institutions (HEI) and what such measures may mean for how HEI fulfill their missions indicates the need to consider a possible relation between such quality measures and undergraduate student attrition. Using official databases from the Colombian Ministry of Education for the year 2009, this study explores through analysis of variance the relationship between attrition and three measures of academic quality: accreditation status, professional test scores required to graduate (Saber Pro Exam) and the number of research groups at HEI. The scope of the study is the Colombian Caribbean region and the sample includes 19 HEI. Study results demonstrate that the percentage of accredited undergraduate programs at HEI was the only measure of quality assurance, out of the three explored, that showed a statistically significant relationship with undergraduate student attrition rates. 相似文献

6.

Measuring Change in Adult Literacy Programs: Enduring issues and a Few Answers

《Educational Assessment》2013,18(2):101-131

Problems in the measurement of student change in adult literacy programs were investigated through repeated testing of a group of students in Adult Basic Education and General Educational Development classes and through computer simulations. Ninety-two students were tested at three points in time with a battery of norm-referenced reading and mathematics tests as well as with tests of reading rate and decoding developed for this study. Change scores were found to vary across tests, with significant declines as well as gains. No significant differences in change scores were found for amount of instructional time or for attendance rate, and a large amount of group heterogeneity was revealed through an analysis of growth patterns. Computer simulations showed that with populations smaller than 200, aggregating grade-equivalent scores can lead to distorted mean changes when compared to aggregate means of equal-interval scale total scores. In contrast, simulations of regression to the mean caused by guessing on multiple-choice tests showed that this effect was relatively small. These results strongly support the conclusion that adult literacy programs cannot be evaluated effectively by any single measure. These findings also support the current efforts to construct multiple indicator systems for evaluating adult literacy programs, systems that attend to the multiple goals of such programs and are free of elementary-level and secondary-level conventions such as grade equivalents. 相似文献

7.

Factors Influencing Science Content Accuracy in Elementary Inquiry Science Lessons

Barbara L. Nowicki Barbara Sullivan-Watts Minsuk K. Shim Betty Young Robert Pockalny 《Research in Science Education》2013,43(3):1135-1154

Elementary teachers face increasing demands to engage children in authentic science process and argument while simultaneously preparing them with knowledge of science facts, vocabulary, and concepts. This reform is particularly challenging due to concerns that elementary teachers lack adequate science background to teach science accurately. This study examined 81 in-classroom inquiry science lessons for preservice education majors and their cooperating teachers to determine the accuracy of the science content delivered in elementary classrooms. Our results showed that 74 % of experienced teachers and 50 % of student teachers presented science lessons with greater than 90 % accuracy. Eleven of the 81 lessons (9 preservice, 2 cooperating teachers) failed to deliver accurate science content to the class. Science content accuracy was highly correlated with the use of kit-based resources supported with professional development, a preference for teaching science, and grade level. There was no correlation between the accuracy of science content and some common measures of teacher content knowledge (i.e., number of college science courses, science grades, or scores on a general science content test). Our study concluded that when provided with high quality curricular materials and targeted professional development, elementary teachers learn needed science content and present it accurately to their students. 相似文献

8.

Curriculum-based measurement in the content areas: vocabulary matching as an indicator of progress in social studies learning

Espin CA Shin J Busch TW 《Journal of learning disabilities》2005,38(4):353-363

The purpose of this study was to examine the reliability and validity of curriculum-based measures as indicator of growth in content-area learning. Participants were 58 students in 2 seventh-grade social studies classes. CBM measures were student- and administrator-read vocabulary-matching probes. Criterion measures were performance on a knowledge test, the social studies subtest of the Iowa Test of Basic Skills (ITBS), and student grades. Both the student- and examiner-read measures reflected change in performance; however, only the student-read measure resulted in interindividual differences in growth rates. Significant relations were found between the growth rates generated by the student-read vocabulary measure and student course grades, ITBS scores, and growth on the knowledge test. These results support the validity of a vocabulary-matching measure as an indicator of student learning in the content areas. The results are discussed in terms of the use of CBM as a system for monitoring performance and evaluating interventions for students with learning disabilities in content-area classrooms. 相似文献

9.

How schools use talent search scores for gifted adolescents

Paula Olszewski‐Kubilius Seon‐Young Lee 《Roeper Review》2013,35(4):233-240

Two hundred fourteen school officials who had students participate in an academic talent search through the Center for Talent Development of Northwestern University responded to a survey regarding how they use off‐level test scores for students’ talent development in school. Data showed that generally talent search is perceived by schools as a means of providing access to outside‐of‐school academic opportunities such as summer and distance learning courses. Few schools use talent search scores to design school‐based educational programs or to determine eligibility for in‐school gifted programs. Other findings included that schools learned about talent search mainly through mailings from the talent search center, gifted coordinators primarily administered talent search in their schools and participation was encouraged via letters to families, students were selected for talent search participation based on achievement test scores at the 95th percentile or above and follow‐up on talent search scores typically consisted of passing out certificates at a special ceremony. Schools that were more active versus less active in talent search were not different in terms of how they conducted or used talent search off‐level test scores. More efforts are needed from local schools to recognize the important role that talent search scores can have in their local programming to enhance the impact of talent search on gifted students. 相似文献

10.

Modeling of Nonlinear Growth to Improve the Accuracy of Identification Decision Rules

W. Holmes Finch Maria E. Hernández Finch Brooke Avery 《Learning disabilities research & practice》2023,38(2):104-118

Progress monitoring using curriculum-based measures administered to a student at multiple points in time is common in educational settings. Recent research has demonstrated that common approaches to identifying individuals in need of special services, such as the trend line or median techniques, can be negatively impacted by the nonlinear change in scores over time. The purpose of this study was to test and demonstrate a nonlinear regression model for adjusting the linear trend line for the presence of such nonlinearities, thereby improving the accuracy of common methods for identifying students in need of special services. Results demonstrated that use of this nonlinear model improved the accuracy of common methods for identifying students in need of special services. 相似文献

11.

Examining the Dual Purpose Use of Student Learning Objectives for Classroom Assessment and Teacher Evaluation

Derek C. Briggs Rajendra Chattergoon Amy Burkhardt 《Journal of Educational Measurement》2019,56(4):686-714

The process of setting and evaluating student learning objectives (SLOs) has become increasingly popular as an example where classroom assessment is intended to fulfill the dual purpose use of informing instruction and holding teachers accountable. A concern is that the high‐stakes purpose may lead to distortions in the inferences about students and teachers that SLOs can support. This concern is explored in the present study by contrasting student SLO scores in a large urban school district to performance on a common objective external criterion. This external criterion is used to evaluate the extent to which student growth scores appear to be inflated. Using 2 years of data, growth comparisons are also made at the teacher level for teachers who submit SLOs and have students that take the state‐administered large‐scale assessment. Although they do show similar relationships with demographic covariates and have the same degree of stability across years, the two different measures of growth are weakly correlated. 相似文献

12.

An alternative instrument for private school competition

D. Cohen-Zada 《Economics of Education Review》2009

Empirical studies estimating the effect of private school competition on student outcomes commonly use the share of Catholics in the local population as an instrument for private school competition. I show that this is not a valid instrument since it is endogenous to private school competition and suggest using instead the local share of Catholics in the population in 1890 and its squared term. These instruments are very strong and are also exogenous to both student achievements and private school competition. I further show that using the current Catholic share as an instrument results in seriously flawed estimates of the effect of private school competition on math test scores and on educational attainment, to the extent that significant positive effects of private school competition on these outcome measures do not hold when the historical Catholic share in 1890 is used as an alternative instrument. The historical Catholic share in 1890 can also be applied to estimate the treatment effect of Catholic schools. 相似文献

13.

Psychometric Properties of Three New National Survey of Student Engagement Based Engagement Scales: An Item Response Theory Analysis 总被引：1，自引：0，他引：1

Adam C. Carle David Jaffee Neil W. Vaughan Douglas Eder 《Research in higher education》2009,50(8):775-794

相似文献

14.

The efficacy of computer‐based supplementary phonics programs for advancing reading skills in at‐risk elementary students

Paul Macaruso Pamela E. Hook Robert McCabe 《Journal of Research in Reading》2006,29(2):162-172

In this study we examined the benefits of computer programs designed to supplement regular reading instruction in an urban public school system. The programs provide systematic exercises for mastering word‐attack strategies. Our findings indicate that first graders who participated in the programs made significant reading gains over the school year. Their post‐test scores were slightly (but not significantly) greater than the post‐test scores of control children who received regular reading instruction without the programs. When analyses were restricted to low‐performing children eligible for Title I services, significantly higher post‐test scores were obtained by the treatment group compared to the control group. At post‐test Title I children in the treatment group performed at levels similar to non‐Title I students. 相似文献

15.

PROGRAMS OF STUDY AS A BASIS FOR SELECTION, PLACEMENT AND GUIDANCE OF COLLEGE STUDENTS

JANE LOEB JOHN BOWERS 《Journal of Educational Measurement》1973,10(2):131-139

Studies of collegiate success and attrition are generally conducted at the all-college level. The definition of academic programs that are homogeneous in the abilities and interests of their students and the grading standards of their faculties may lead to more accurate prediction of success and more effective control of attrition.
Homogeneous curricular groups were defined via Ward's hierarchical grouping analysis applied to curricular means on high school percentile rank, four ACT subscores, first semester GPA, and 16 Kuder scores. Programs so defined differed on scientific-verbal and competitive level dimensions. Prediction of grades was more accurate within programs than colleges. Drop and transfer rates were correlated with discriminant scores.
The programs are discussed as promising units within which differential selection and placement strategies might reduce attrition. 相似文献

16.

Indexing response to intervention: a longitudinal study of reading risk from kindergarten through third grade

Simmons DC Coyne MD Kwok OM McDonagh S Ham BA Kame'enui EJ 《Journal of learning disabilities》2008,41(2):158-173

In this study, response to intervention and stability of reading performance of 41 kindergarten children identified as at risk of reading difficulty were evaluated from kindergarten through third grade. All students were assessed in the fall of each academic year to evaluate need for intervention, and students who fell below the 30th percentile on criterion measures received small-group supplemental intervention. Measures included a combination of commercial normative referenced measures and specific skill and construct measures to assess growth or change in reading risk status relative to 30th percentile benchmarks. Results indicated that consistent with the findings of prior research involving students with comparable entry-level performance, the majority of children identified as at risk in the beginning of kindergarten responded early and positively to intervention. On average, absolute performance levels at the end of kindergarten positioned students for trajectories of later reading performance that exceeded the 50th percentile on the majority of measures. Moreover, changes in risk status that occurred early were generally sustained over time. Only oral reading fluency performance failed to exceed the 30th percentile for the majority of students. 相似文献

17.

Weekend feeding (“BackPack”) programs and student outcomes

《Economics of Education Review》2020

Weekend feeding (“BackPack”) programs that provide food to children have grown dramatically in recent years, yet their effects on educational outcomes have been little investigated. Our study combines administrative student data on test scores and absences in Northwest North Carolina elementary schools with primary data on program participation. School and student program eligibility criteria is used to estimate the intent-to-treat effect within a difference-in-difference-in-difference (DDD) framework. Results suggest a sizable 0.09 standard deviation improvement in reading scores, with a similar but weaker effect for math scores. These effects are strongest for the youngest and lowest performing students. 相似文献

18.

Moving Beyond the Brag Sheet: A Meta-Analysis of Biodata Measures Predicting Student Outcomes

Charlene Zhang Nathan R. Kuncel 《Educational Measurement》2020,39(3):106-121

Measures of biographical data, or biodata, provide indicators of one's life history and past experiences. Biodata information is often available in various forms during processes of academic admissions to higher education. Such information can be used, in combination with other factors, to predict students’ future academic and extra-curricular accomplishments. There is a scattered body of literature investigating relationships between standardized biodata measures and a number of student criteria in college. The current study uses meta-analysis methods to summarize findings on how various biodata measures—overall scores or scale scores—predict student accomplishments, including grades, self- and other-rated performances, persistence, and extracurricular accomplishments. Data from 46 independent samples, consisting of 38,478 students and resulting in 74 individual predictor–criterion relationships were analyzed. Results indicate, generally, that biodata measures predict substantially students’ academic and extra-curricular accomplishments. Overall biodata scores correlate with grades at .39, persistence at .25, and point-hour ratios at .35. Students’ accomplishments in leadership, visual and performing arts, music, and science were predicted best by biodata measures developed specifically to target those outcomes. This meta-analytic study provides support for the predictive validity of biographical data inventories with respect to student outcomes and adds justification to the use of biodata in academic selection. 相似文献

19.

THE PSYCHOLOGY OF EDUCATIONAL MEASUREMENT

SAMUEL MESSICK 《Journal of Educational Measurement》1984,21(3):215-237

Because school learning entails not just accretion of knowledge but the structuring and restructuring of knowledge and cognitive skills, the conception and construction of educational achievement measures must be cast in developmental terms. And because student characteristics as well as social and educational experiences influence current performance, the interpretation and implications of educational achievement measures must be relative to intrapersonal and situational contexts. These points imply a strategy of comprehensive assessment in context that focuses on the processes and structures involved in subject-matter competence as moderated in performance by personal and environmental influences. This article addresses in detail both the nature of developing competence and its measurement in terms of context-dependent task performance. Construct-irrelevant task difficulty that might jeopardize the meaning of test scores as well as construct-irrelevant influences that might jeopardize implications for action are taken into account via the comprehensive measurement of relevant contextual factors. Comprehensive assessment in context thus facilitates valid interpretations of the meaning and implications of ability and achievement scores in particular instances, thereby lightening the interpretive and ethical burdens on test users and enhancing the validity of test use. 相似文献

20.

Providing transparency and credibility: the selection of international students for Australian universities. An examination of the relationship between scores in the International Student Admissions Test (ISAT), final year academic programs and an Australian university’s foundation program

Kelvin Lai Susan Nankervis Margot Story Wayne Hodgson Michael Lewenberg 《高等教育研究与发展》2008,27(4):331-344

Throughout 2003–04 five cohorts of students in their final year of school studies in various Malaysian colleges and a group of students completing an Australian university foundation year in Malaysia sat the International Student Admissions Test (ISAT). The ISAT is a multiple‐choice test of general academic abilities developed for students whose first language is not English. Both sets of scores were examined to investigate the relationship between skills measured by the academic programs and the generic reasoning skills measured by the ISAT. The data were examined by looking at correlations and patterns of the ISAT scores, and the total academic program scores and individual subject scores. As well, multiple regression was used to examine if the ISAT could act as a predictor for academic program scores. Although the ISAT and measures of achievement in the academic programs are two completely different instruments, the study showed that: (i) the scores were positively and significantly correlated; (ii) patterns of co‐variation of the ISAT and academic program scores demonstrate a positive relationship; and (iii) there is evidence that achieving a high score in the academic programs requires high reasoning skills, as measured by the ISAT. The findings of this study indicate that the ISAT is a useful predictor of student ability for use in the university selection process for international applicants. 相似文献