首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Student growth percentiles (SGPs) express students' current observed scores as percentile ranks in the distribution of scores among students with the same prior‐year scores. A common concern about SGPs at the student level, and mean or median SGPs (MGPs) at the aggregate level, is potential bias due to test measurement error (ME). Shang, vanIwaarden, and Betebenner (SVB; this issue) develop a simulation‐extrapolation (SIMEX) approach to adjust SGPs for test ME. In this paper, we use a tractable example in which different SGP estimators, including SVB's SIMEX estimator, can be computed analytically to explain why ME is detrimental to both student‐level and aggregate‐level SGP estimation. A comparison of the alternative SGP estimators to the standard approach demonstrates the common bias‐variance tradeoff problem: estimators that decrease the bias relative to the standard SGP estimator increase variance, and vice versa. Even the most accurate estimator for individual student SGP has large errors of roughly 19 percentile points on average for realistic settings. Those estimators that reduce bias may suffice at the aggregate level but no single estimator is optimal for meeting the dual goals of student‐ and aggregate level inferences.  相似文献   

2.
《Educational Assessment》2013,18(4):255-258
Editor's Introduction. Reliability Versus Accuracy: A Critical Distinction Test reliability coefficients traditionally have been used to judge the quality of measurement. And, reliability coefficients of .90 have often been considered adequate to assure the quality for standardized testing and large-scale assessment programs. However, a test reliability of .90 (or above) does not ensure that individual test scores, such as national percentile ranks, are accurate. Consider, for example, a mathematics test with a reliability of .90 and imagine a student taking that test whose true score is at the 50th percentile; that is, we know that the student's actual capability is at that level. The probability is less than one third (.309) that when the student takes the test, he or she will obtain a score within 5 percentile points of his or her true score, the 50th percentile (Rogosa 1999a, 1999b). The following informal example attempts to explain why high test reliability does not indicate good accuracy for an individual score, without the encumbrances of percentile rank scoring, complex measurement models, and other technical detail. Dedicated to Al Bundy-A man who cares as much about good measurement as he does about his own children.  相似文献   

3.
4.
While states are no longer required to set up teacher evaluation systems based in significant part on student test scores, quite a few continue to use value-added (VAMs) or student growth percentile (SGP) models for that purpose. In this study, we analyzed three years of teacher data to illustrate the performance of teachers’ median growth percentiles (MGPs)). We found MGP’s consistency over time to be comparable with the existing estimates from the value-added models (VAMs). Additionally, we found that MGPs do not substantively agree with another measure of teacher quality – teachers’ observational scores. These findings suggest that caution should be exercised when teacher’s MGPs, as well as VAMs, are used in teacher evaluation system to make high-stakes decisions such as merit pay, tenure, or teacher contract termination. Our findings about the correlation of MGPs with observational scores support the idea of the multidimensional nature of teacher effectiveness construct.  相似文献   

5.
Undergraduate student attrition is a major concern in higher education. It is usually explained by the impact of student attributes; however, recent developments in student success literature point to the need of exploring institutional practices that may impact a student's decision to abandon their studies. The current weight of academic quality assurance for Colombian higher education institutions (HEI) and what such measures may mean for how HEI fulfill their missions indicates the need to consider a possible relation between such quality measures and undergraduate student attrition. Using official databases from the Colombian Ministry of Education for the year 2009, this study explores through analysis of variance the relationship between attrition and three measures of academic quality: accreditation status, professional test scores required to graduate (Saber Pro Exam) and the number of research groups at HEI. The scope of the study is the Colombian Caribbean region and the sample includes 19 HEI. Study results demonstrate that the percentage of accredited undergraduate programs at HEI was the only measure of quality assurance, out of the three explored, that showed a statistically significant relationship with undergraduate student attrition rates.  相似文献   

6.
《Educational Assessment》2013,18(2):101-131
Problems in the measurement of student change in adult literacy programs were investigated through repeated testing of a group of students in Adult Basic Education and General Educational Development classes and through computer simulations. Ninety-two students were tested at three points in time with a battery of norm-referenced reading and mathematics tests as well as with tests of reading rate and decoding developed for this study. Change scores were found to vary across tests, with significant declines as well as gains. No significant differences in change scores were found for amount of instructional time or for attendance rate, and a large amount of group heterogeneity was revealed through an analysis of growth patterns. Computer simulations showed that with populations smaller than 200, aggregating grade-equivalent scores can lead to distorted mean changes when compared to aggregate means of equal-interval scale total scores. In contrast, simulations of regression to the mean caused by guessing on multiple-choice tests showed that this effect was relatively small. These results strongly support the conclusion that adult literacy programs cannot be evaluated effectively by any single measure. These findings also support the current efforts to construct multiple indicator systems for evaluating adult literacy programs, systems that attend to the multiple goals of such programs and are free of elementary-level and secondary-level conventions such as grade equivalents.  相似文献   

7.
Elementary teachers face increasing demands to engage children in authentic science process and argument while simultaneously preparing them with knowledge of science facts, vocabulary, and concepts. This reform is particularly challenging due to concerns that elementary teachers lack adequate science background to teach science accurately. This study examined 81 in-classroom inquiry science lessons for preservice education majors and their cooperating teachers to determine the accuracy of the science content delivered in elementary classrooms. Our results showed that 74 % of experienced teachers and 50 % of student teachers presented science lessons with greater than 90 % accuracy. Eleven of the 81 lessons (9 preservice, 2 cooperating teachers) failed to deliver accurate science content to the class. Science content accuracy was highly correlated with the use of kit-based resources supported with professional development, a preference for teaching science, and grade level. There was no correlation between the accuracy of science content and some common measures of teacher content knowledge (i.e., number of college science courses, science grades, or scores on a general science content test). Our study concluded that when provided with high quality curricular materials and targeted professional development, elementary teachers learn needed science content and present it accurately to their students.  相似文献   

8.
The purpose of this study was to examine the reliability and validity of curriculum-based measures as indicator of growth in content-area learning. Participants were 58 students in 2 seventh-grade social studies classes. CBM measures were student- and administrator-read vocabulary-matching probes. Criterion measures were performance on a knowledge test, the social studies subtest of the Iowa Test of Basic Skills (ITBS), and student grades. Both the student- and examiner-read measures reflected change in performance; however, only the student-read measure resulted in interindividual differences in growth rates. Significant relations were found between the growth rates generated by the student-read vocabulary measure and student course grades, ITBS scores, and growth on the knowledge test. These results support the validity of a vocabulary-matching measure as an indicator of student learning in the content areas. The results are discussed in terms of the use of CBM as a system for monitoring performance and evaluating interventions for students with learning disabilities in content-area classrooms.  相似文献   

9.
Two hundred fourteen school officials who had students participate in an academic talent search through the Center for Talent Development of Northwestern University responded to a survey regarding how they use off‐level test scores for students’ talent development in school. Data showed that generally talent search is perceived by schools as a means of providing access to outside‐of‐school academic opportunities such as summer and distance learning courses. Few schools use talent search scores to design school‐based educational programs or to determine eligibility for in‐school gifted programs. Other findings included that schools learned about talent search mainly through mailings from the talent search center, gifted coordinators primarily administered talent search in their schools and participation was encouraged via letters to families, students were selected for talent search participation based on achievement test scores at the 95th percentile or above and follow‐up on talent search scores typically consisted of passing out certificates at a special ceremony. Schools that were more active versus less active in talent search were not different in terms of how they conducted or used talent search off‐level test scores. More efforts are needed from local schools to recognize the important role that talent search scores can have in their local programming to enhance the impact of talent search on gifted students.  相似文献   

10.
Progress monitoring using curriculum-based measures administered to a student at multiple points in time is common in educational settings. Recent research has demonstrated that common approaches to identifying individuals in need of special services, such as the trend line or median techniques, can be negatively impacted by the nonlinear change in scores over time. The purpose of this study was to test and demonstrate a nonlinear regression model for adjusting the linear trend line for the presence of such nonlinearities, thereby improving the accuracy of common methods for identifying students in need of special services. Results demonstrated that use of this nonlinear model improved the accuracy of common methods for identifying students in need of special services.  相似文献   

11.
The process of setting and evaluating student learning objectives (SLOs) has become increasingly popular as an example where classroom assessment is intended to fulfill the dual purpose use of informing instruction and holding teachers accountable. A concern is that the high‐stakes purpose may lead to distortions in the inferences about students and teachers that SLOs can support. This concern is explored in the present study by contrasting student SLO scores in a large urban school district to performance on a common objective external criterion. This external criterion is used to evaluate the extent to which student growth scores appear to be inflated. Using 2 years of data, growth comparisons are also made at the teacher level for teachers who submit SLOs and have students that take the state‐administered large‐scale assessment. Although they do show similar relationships with demographic covariates and have the same degree of stability across years, the two different measures of growth are weakly correlated.  相似文献   

12.
Empirical studies estimating the effect of private school competition on student outcomes commonly use the share of Catholics in the local population as an instrument for private school competition. I show that this is not a valid instrument since it is endogenous to private school competition and suggest using instead the local share of Catholics in the population in 1890 and its squared term. These instruments are very strong and are also exogenous to both student achievements and private school competition. I further show that using the current Catholic share as an instrument results in seriously flawed estimates of the effect of private school competition on math test scores and on educational attainment, to the extent that significant positive effects of private school competition on these outcome measures do not hold when the historical Catholic share in 1890 is used as an alternative instrument. The historical Catholic share in 1890 can also be applied to estimate the treatment effect of Catholic schools.  相似文献   

13.
14.
In this study we examined the benefits of computer programs designed to supplement regular reading instruction in an urban public school system. The programs provide systematic exercises for mastering word‐attack strategies. Our findings indicate that first graders who participated in the programs made significant reading gains over the school year. Their post‐test scores were slightly (but not significantly) greater than the post‐test scores of control children who received regular reading instruction without the programs. When analyses were restricted to low‐performing children eligible for Title I services, significantly higher post‐test scores were obtained by the treatment group compared to the control group. At post‐test Title I children in the treatment group performed at levels similar to non‐Title I students.  相似文献   

15.
Studies of collegiate success and attrition are generally conducted at the all-college level. The definition of academic programs that are homogeneous in the abilities and interests of their students and the grading standards of their faculties may lead to more accurate prediction of success and more effective control of attrition.
Homogeneous curricular groups were defined via Ward's hierarchical grouping analysis applied to curricular means on high school percentile rank, four ACT subscores, first semester GPA, and 16 Kuder scores. Programs so defined differed on scientific-verbal and competitive level dimensions. Prediction of grades was more accurate within programs than colleges. Drop and transfer rates were correlated with discriminant scores.
The programs are discussed as promising units within which differential selection and placement strategies might reduce attrition.  相似文献   

16.
In this study, response to intervention and stability of reading performance of 41 kindergarten children identified as at risk of reading difficulty were evaluated from kindergarten through third grade. All students were assessed in the fall of each academic year to evaluate need for intervention, and students who fell below the 30th percentile on criterion measures received small-group supplemental intervention. Measures included a combination of commercial normative referenced measures and specific skill and construct measures to assess growth or change in reading risk status relative to 30th percentile benchmarks. Results indicated that consistent with the findings of prior research involving students with comparable entry-level performance, the majority of children identified as at risk in the beginning of kindergarten responded early and positively to intervention. On average, absolute performance levels at the end of kindergarten positioned students for trajectories of later reading performance that exceeded the 50th percentile on the majority of measures. Moreover, changes in risk status that occurred early were generally sustained over time. Only oral reading fluency performance failed to exceed the 30th percentile for the majority of students.  相似文献   

17.
Weekend feeding (“BackPack”) programs that provide food to children have grown dramatically in recent years, yet their effects on educational outcomes have been little investigated. Our study combines administrative student data on test scores and absences in Northwest North Carolina elementary schools with primary data on program participation. School and student program eligibility criteria is used to estimate the intent-to-treat effect within a difference-in-difference-in-difference (DDD) framework. Results suggest a sizable 0.09 standard deviation improvement in reading scores, with a similar but weaker effect for math scores. These effects are strongest for the youngest and lowest performing students.  相似文献   

18.
Measures of biographical data, or biodata, provide indicators of one's life history and past experiences. Biodata information is often available in various forms during processes of academic admissions to higher education. Such information can be used, in combination with other factors, to predict students’ future academic and extra-curricular accomplishments. There is a scattered body of literature investigating relationships between standardized biodata measures and a number of student criteria in college. The current study uses meta-analysis methods to summarize findings on how various biodata measures—overall scores or scale scores—predict student accomplishments, including grades, self- and other-rated performances, persistence, and extracurricular accomplishments. Data from 46 independent samples, consisting of 38,478 students and resulting in 74 individual predictor–criterion relationships were analyzed. Results indicate, generally, that biodata measures predict substantially students’ academic and extra-curricular accomplishments. Overall biodata scores correlate with grades at .39, persistence at .25, and point-hour ratios at .35. Students’ accomplishments in leadership, visual and performing arts, music, and science were predicted best by biodata measures developed specifically to target those outcomes. This meta-analytic study provides support for the predictive validity of biographical data inventories with respect to student outcomes and adds justification to the use of biodata in academic selection.  相似文献   

19.
Because school learning entails not just accretion of knowledge but the structuring and restructuring of knowledge and cognitive skills, the conception and construction of educational achievement measures must be cast in developmental terms. And because student characteristics as well as social and educational experiences influence current performance, the interpretation and implications of educational achievement measures must be relative to intrapersonal and situational contexts. These points imply a strategy of comprehensive assessment in context that focuses on the processes and structures involved in subject-matter competence as moderated in performance by personal and environmental influences. This article addresses in detail both the nature of developing competence and its measurement in terms of context-dependent task performance. Construct-irrelevant task difficulty that might jeopardize the meaning of test scores as well as construct-irrelevant influences that might jeopardize implications for action are taken into account via the comprehensive measurement of relevant contextual factors. Comprehensive assessment in context thus facilitates valid interpretations of the meaning and implications of ability and achievement scores in particular instances, thereby lightening the interpretive and ethical burdens on test users and enhancing the validity of test use.  相似文献   

20.
Throughout 2003–04 five cohorts of students in their final year of school studies in various Malaysian colleges and a group of students completing an Australian university foundation year in Malaysia sat the International Student Admissions Test (ISAT). The ISAT is a multiple‐choice test of general academic abilities developed for students whose first language is not English. Both sets of scores were examined to investigate the relationship between skills measured by the academic programs and the generic reasoning skills measured by the ISAT. The data were examined by looking at correlations and patterns of the ISAT scores, and the total academic program scores and individual subject scores. As well, multiple regression was used to examine if the ISAT could act as a predictor for academic program scores. Although the ISAT and measures of achievement in the academic programs are two completely different instruments, the study showed that: (i) the scores were positively and significantly correlated; (ii) patterns of co‐variation of the ISAT and academic program scores demonstrate a positive relationship; and (iii) there is evidence that achieving a high score in the academic programs requires high reasoning skills, as measured by the ISAT. The findings of this study indicate that the ISAT is a useful predictor of student ability for use in the university selection process for international applicants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号