首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A number of studies have focused on how students and instructors feel about digital learning technologies. This research is focused on the substantive difference in learning outcomes between traditional classrooms and classrooms using clickers. A randomized block experimental design involving four sections of undergraduate Operations Management classes was used to determine if clicker systems increase student learning of both quantitative and conceptual material in Operations Management. Learning was measured using the difference between the scores on an entrance examination and the final examination. The findings of this research provide evidence that the use of immediate feedback using a technology like clickers can have a positive impact on student learning as measured by test scores.  相似文献   

2.
This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test scores, in overall score distributions and also at specific test scores. In addition to detecting item order effects, the integrated procedures also suggest the equating function that most adequately adjusts the scores to mitigate the effects. To demonstrate, the statistical equivalences of alternate versions of two large-volume advanced placement exams were assessed.  相似文献   

3.
Reliability of Scores From Teacher-Made Tests   总被引:1,自引:0,他引:1  
Reliability is the property of a set of test scores that indicates the amount of measurement error associated with the scores. Teachers need to know about reliability so that they can use test scores to make appropriate decisions about their students. The level of consistency of a set of scores can he estimated by using the methods of internal analysis to compute a reliability coefficient. This coefficient, which can range between 0.0 and +1.0, usually has values around 0.50 for teacher-made tests and around 0.90 for commercially prepared standardized tests. Its magnitude can be affected by such factors as test length, test-item difficulty and discrimination, time limits, and certain characteristics of the group—extent of their testwiseness, level of student motivation, and homogeneity in the ability measured by the test.  相似文献   

4.
The use of assessment results to inform school accountability relies on the assumption that the test design appropriately represents the content and cognitive emphasis reflected in the state's standards. Since the passage of the Every Student Succeeds Act and the certification of accountability assessments through federal peer review practices, the content validity arguments supporting accountability have relied almost exclusively on the alignment of statewide assessments to state standards. It is assumed that if alignment does not hold, the scores will not provide valid inferences regarding the degree to which test takers have performed. Although alignment results are commonly used as evidence of test appropriateness, Polikoff (this issue) would argue that given the importance of alignment in policy decisions, research related to alignment is surprisingly limited. Few studies have addressed the adequacy of alignment methodologies and results as support for the inferences to be made (i.e., proficient on state standards). This paper uses an example of test taker performance (and common performance indicators) to investigate to what extent the degree of alignment impacts inferences made about performance (i.e., classification into performance levels, estimates of student ability, and student rank order).  相似文献   

5.
This article focuses on the relation between student population characteristics and average test scores per school in the final grade of primary education from a dynamic perspective. Aggregated data of over 5,000 Dutch primary schools covering a 6-year period were used to study the relation between changes in school populations and shifts in mean test scores. Path analysis findings indicate that changes in student populations bring about instant changes in school averages. However, the impact of these changes is strongly mitigated by the effects of school results in the past. This reveals long-lasting effects of student population characteristics via past performance and explains the relative stability in raw output measures over time to a considerable extent.  相似文献   

6.
Several studies provide preliminary evidence that computer use is positively related to academic performance; however, no clear relationship has yet been established. Using a national database, we analyzed how students’ school behavior (i.e., evaluated by English and math teachers) and standardized test scores (e.g., math and reading) are related to computer use for school work or other than school work for the tenth grade student. While controlling socioeconomic status (SES), home computer access, parental involvement, and students’ academic expectation variables, the students who used a computer for one hour per day showed more positive school behaviors and higher reading and math test scores. This article concludes with implications for future study to better understand the impact of computer use on adolescent academic development.  相似文献   

7.
The Moral Competence Test (MCT) was designed over 30 years ago to provide a resource for educators interested in conducting cross-cultural studies of moral development and education. Since its origin, it has been translated into at least 30 languages and used in hundreds of studies. However, few studies provide evidence to support the use of the test in the US. The test’s designer identified three criteria for evaluating the construct validity of the test and its primary scores: do correlations of stage scores reflect a simplex structure, do ratings follow the theoretical order of stages, does the test differentiate preferences and structures of reasoning. We use these criteria and evidence of criterion and content validity to assess the validity of the MCT. We present results from two US samples (n = 772). Results analyzing the test author’s criteria support the semantic validity of the test, however, evidence of criterion validity raise questions about the C-score as a measure of moral competence. After controlling for stage preferences, the C-score was negatively related to democratic attitudes and positively related to dogmatism.  相似文献   

8.
This paper uses measurements of learning inequality to explore whether learning interventions that are aimed at improving means also reduce inequality, and if so, under what conditions. There is abundant evidence that learning levels are generally low in low- and middle-income countries (LMIC), but there is less knowledge about how learning achievement is distributed within these contexts, and especially about how these distributions change as mean levels increase. We use child-level data on foundational literacy outcomes to quantitatively explore whether and how learning inequality using metrics borrowed from the economics and inequality literature can help us understand the impact of learning interventions. The paper deepens recent work in several ways. First, it extends the analysis to six LMIC, displaying which measures are computable and coherent across contexts and baseline levels. This extension can add valuable information to program evaluation, without being redundant with other metrics. Second, we show the large extent to which the disaggregation of inequality of foundational skills between- and within-schools and grades varies by context and language. Third, we present initial empirical evidence that, at least in the contexts of analysis of foundational interventions, improving average performance can reduce inequality as well, across all levels of socioeconomic status (SES). The data show that at baseline, the groups with the highest internal inequality tend to be the groups with lowest SES and lowest reading scores, as inequality among the poor themselves is higher than among their wealthier counterparts. Regardless of which SES groups benefit more in terms of a change in mean levels of reading, there is still a considerable reduction in inequality by baseline achievement as means increase. These results have policy implications in terms of targeting of interventions: much can be achieved in terms of simultaneously improving averages and increasing equality. This seems particularly true when the initial learning levels are as low as they currently are the developing world.  相似文献   

9.
Students with the most significant cognitive disabilities (SCD) are the 1% of the total student population who have a disability or multiple disabilities that significantly impact intellectual functioning and adaptive behaviors and who require individualized instruction and substantial supports. Historically, these students have received little instruction in science and the science assessments they have participated in have not included age‐appropriate science content. Guided by a theory of action for a new assessment system, an eight‐state consortium developed multidimensional alternate content standards and alternate assessments in science for students in three grade bands (3–5, 6–8, 9–12) that are linked to the Next Generation Science Standards (NGSS Lead States, 2013 ) and A Framework for K‐12 Science Education (Framework; National Research Council, 2012 ). The great variability within the population of students with SCD necessitates variability in the assessment content, which creates inherent challenges in establishing technical quality. To address this issue, a primary feature of this assessment system is the use of hypothetical cognitive models to provide a structure for variability in assessed content. System features and subsequent validity studies were guided by a theory of action that explains how the proposed claims about score interpretation and use depend on specific assumptions about the assessment, as well as precursors to the assessment. This paper describes evidence for the main claim that test scores represent what students know and can do. We present validity evidence for the assumptions about the assessment and its precursors, related to this main claim. The assessment was administered to over 21,000 students in eight states in 2015–2016. We present selected evidence from system components, procedural evidence, and validity studies. We evaluate the validity argument and demonstrate how it supports the claim about score interpretation and use.  相似文献   

10.
In this study we use data from the Early Childhood Longitudinal Survey third- and fifth-grade samples to investigate teacher judgments of student achievement, the extent to which they offer a similar picture of student mathematics achievement compared to standardized test scores, and whether classroom assessment practices moderate the relationship between the two measures. Results indicate that teacher ratings correlate strongly with standardized test scores; however, this relationship varies considerably across teachers, and this variation is associated with certain classroom assessment practices. Furthermore, the evidence suggests that teachers evaluate student performance not in absolute terms but relative to other students in the school and that they may adjust their grading for some students, perhaps with basis on perceived differences in need and/or ability.  相似文献   

11.
Policies that require the use of information about student achievement to evaluate teacher performance are becoming increasingly common across the United States, but there is some question as to how or whether to use student test-based teacher evaluations when student assessments change. We bring empirical evidence to bear on this issue. Specifically, we examine how estimates of teacher value-added are influenced by assessment changes across 12 test transitions in two subjects and five states. In all of the math transitions we study, value-added measures from test change years and stable regime years are broadly similar in terms of their statistical properties and informational content. This is also true for some of the reading transitions; we do find, however, some cases in which an assessment change in reading meaningfully alters value-added measures. Our study directly informs contemporary policy debates about how to evaluate teachers when new assessments are introduced and provides a general analytic framework for examining employee evaluation policies in the face of changing evaluation metrics.  相似文献   

12.
Predictors for Mathematics Achievement? Evidence From a Longitudinal Study   总被引:1,自引:0,他引:1  
Numerical processing has been extensively studied by examining the performance on basic number processing tasks, such as number priming, number comparison, and number line estimation. These tasks assess the innate “number sense,” which is assumed to be the breeding ground for later mathematics development. Indeed, several studies have associated children's performance in these tasks with individual differences in mathematical achievement. To date, however, most of these studies have cross‐sectional designs. Moreover, the few longitudinal studies either use complex tasks (e.g., story problems) or investigate only one of these basic number processing tasks at a time. In this study, we examine the association between the performance of children on several basic number processing tasks and their individual math achievement scores on a curriculum‐based test measured 1 year later. Regression analyses showed that most of the variance in children's math achievement was predicted by nonsymbolic number line estimation performance (i.e., estimating large quantities of dots) and, to a lesser extent, the speed of comparing symbolic numbers. This knowledge about the predictive value of the performance of 5‐ to 7‐year‐olds on these markers of number processing can help with the early identification of at‐risk children. In addition, this information can guide appropriate educational interventions.  相似文献   

13.
While states are no longer required to set up teacher evaluation systems based in significant part on student test scores, quite a few continue to use value-added (VAMs) or student growth percentile (SGP) models for that purpose. In this study, we analyzed three years of teacher data to illustrate the performance of teachers’ median growth percentiles (MGPs)). We found MGP’s consistency over time to be comparable with the existing estimates from the value-added models (VAMs). Additionally, we found that MGPs do not substantively agree with another measure of teacher quality – teachers’ observational scores. These findings suggest that caution should be exercised when teacher’s MGPs, as well as VAMs, are used in teacher evaluation system to make high-stakes decisions such as merit pay, tenure, or teacher contract termination. Our findings about the correlation of MGPs with observational scores support the idea of the multidimensional nature of teacher effectiveness construct.  相似文献   

14.
Education and poverty in rural China   总被引:3,自引:0,他引:3  
We analyze household and school survey data from poor counties in six Chinese provinces to examine the effects of poverty, intra-household decision-making, and school quality on educational investments (enrollment decisions) and learning outcomes (test scores and grade promotion). Unlike previous studies, we use direct measures of credit limits and women's empowerment. Drawing a distinction between the effects of wealth (measured by expenditures per capita) and credit constraints, we find that the former improves learning while the latter reduces educational investments. We find evidence of a story of gender bias in which academically weak girls are more likely to drop out in primary school while most boys continue on to junior secondary school. Women's empowerment reduces the likelihood of dropping out but does not affect other outcomes. Finally, our measures of school quality have some effect on the duration of primary school enrollment but not on learning.  相似文献   

15.
This paper investigates the relationship between personality traits in adolescence and performance in high school using a large and recent cohort study. In particular, we investigate the impact of locus of control, self-esteem, and work ethics at age 15, on test scores at age 16, and on subject choices and subsequent performance at age 17–18. In particular, individuals with external locus of control or with low levels of self-esteem seem less likely to have good performance in test scores at age 16 and to pursue further studies at 17–18, especially in Mathematics or Science.We use matching methods to control for a rich set of adolescent and family characteristics and we find that personality traits do affect study choices and performance in test scores – particularly in Mathematics and Science. We explore the robustness of our results using the methodology proposed by Altonji, Elder, and Taber (2005) that consists of making hypotheses about the correlation between the unobservables that determine test scores and subjects’ choices and the unobservables that influence personality.  相似文献   

16.
The conventional focus of validity in educational measurement has been on intended interpretations and uses of test scores. Empirical studies of test use by teachers, administrators and policy-makers show that actual interpretations and uses of test scores in context are invariably shaped by local users’ questions, which frequently require attention to multiple sources of evidence about students’ learning and the factors that shape it, and depend on local capacity to use such information well. This requires a more complex theory of validity that can shift focus as needed from the intended interpretations and uses of test scores that guide test developers to local capacity to support the actual interpretations, decisions and actions that routinely serve local users’ purposes. I draw on the growing empirical literature on data use to illustrate the need for an expanded theory of validity, point to theoretical resources that might guide such an expansion, and suggest a research agenda towards these ends.  相似文献   

17.
We conducted generalizability studies to examine the extent to which ratings of language arts performance assignments, administered in a large, diverse, urban district to students in second through ninth grades, result in reliable and precise estimates of true student performance. The results highlight three important points when considering the use of performance assessments in large-scale settings: (a) Rater training may significantly impact reliability; (b) simple rater agreement indices do not provide enough information to assess the reliability of inferences about true student achievement; and (c) assessments adequate for relative judgments of student performance do not necessarily provide sufficient precision for absolute criterion-referenced decisions.  相似文献   

18.
Executive functioning (EF) is a strong predictor of children's and adolescents' academic performance. Although research indicates that EF can increase during childhood and adolescence, few studies have tracked the effect of EF on academic performance throughout the middle school grades. EF was measured at the end of Grades 6–9 through 21 teachers' and 22 teacher assistants' assessments of 322 adolescents from disadvantaged backgrounds who attended an urban, chartered middle/high school. Assessment of EF was done through the completion of the Behavior Rating Inventory of Executive Function (BRIEF). BRIEF global executive composite scores (GEC) predicted both current and future English/language arts, mathematics, science, social studies, and Spanish annual grade point averages (GPAs). The effect of BRIEF GEC scores often overshadowed the effects of gender, poverty, and having an individual education plan; the other, non–BRIEF-related effects retained slightly more impact among teacher assistant–derived data than teacher-derived data. The strong relationships between BRIEF GEC scores and these GPAs also remained constant over these 4 years: There was little evidence that EF changed over the measured grades or that the relationship between EF and grades itself regularly changed. The findings indicate that EF scores during early middle grades can well predict academic performance in subsequent secondary-school grades. Although methodological constraints may have impeded the abilities of other factors (i.e., poverty) to be significantly related to GPAs, the effects of EF were strong and robust enough to prompt us to recommend its use to guide long-term, academic interventions.  相似文献   

19.
Evaluating the multiple characteristics of alignment has taken a prominent role in educational assessment and accountability systems given its attention in the No Child Left Behind legislation (NCLB). Leading to this rise in popularity, alignment methodologies that examined relationships among curriculum, academic content standards, instruction, and assessments were proposed as strategies to evaluate evidence of the intended uses and interpretations of test scores. In this article, we propose a framework for evaluating alignment studies based on similar concepts that have been recommended for standard setting (Kane). This framework provides guidance to practitioners about how to identify sources of validity evidence for an alignment study and make judgments about the strength of the evidence that may impact the interpretation of the results.  相似文献   

20.
This paper reports statistical analysis of the determinants of average student performance on standardized examinations, and also the determinants of the extent to which students fail such examinations. Unlike most other cross-sectional studies of performance among school districts within one state, this study uses the quality of teachers, as measured by standardized test scores, as a determinant of performance. Perhaps the most striking empirical result of the study is the finding that a 1% increase in teacher quality, as measured by standardized test scores, is accompanied by a 5% decline in the rate of failure of students on standardized competency examinations. The corresponding impact on average or mean achievement of teacher quality is. by contrast, quite modest: 0.5%–0.8% per 1% improvement in teacher quality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号