首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Student growth percentiles (SGPs) express students' current observed scores as percentile ranks in the distribution of scores among students with the same prior‐year scores. A common concern about SGPs at the student level, and mean or median SGPs (MGPs) at the aggregate level, is potential bias due to test measurement error (ME). Shang, vanIwaarden, and Betebenner (SVB; this issue) develop a simulation‐extrapolation (SIMEX) approach to adjust SGPs for test ME. In this paper, we use a tractable example in which different SGP estimators, including SVB's SIMEX estimator, can be computed analytically to explain why ME is detrimental to both student‐level and aggregate‐level SGP estimation. A comparison of the alternative SGP estimators to the standard approach demonstrates the common bias‐variance tradeoff problem: estimators that decrease the bias relative to the standard SGP estimator increase variance, and vice versa. Even the most accurate estimator for individual student SGP has large errors of roughly 19 percentile points on average for realistic settings. Those estimators that reduce bias may suffice at the aggregate level but no single estimator is optimal for meeting the dual goals of student‐ and aggregate level inferences.  相似文献   

2.
In this study, we examined the impact of covariate measurement error (ME) on the estimation of quantile regression and student growth percentiles (SGPs), and find that SGPs tend to be overestimated among students with higher prior achievement and underestimated among those with lower prior achievement, a problem we describe as ME endogeneity in this article. We proceeded to assess the effect of covariate ME correction on SGP estimation at two levels—the individual (student) and the aggregate (classroom). Our ME correction approach was limited to the simulation‐extrapolation method known as SIMEX. For both the individual and aggregate SGP, we find SIMEX effective in bias reduction. Further, because SIMEX is especially effective in reducing SGP bias for students with very high or very low prior achievement, it significantly weakens the ME endogeneity. SIMEX is also effective in reducing the MSE of aggregate SGP, provided that the students are sorted to some extent on their latent prior achievement. Our empirical study confirms the pattern of the simulation results: SIMEX mainly affects the mean SGP of classes in the highest and lowest quintiles of the prior score distribution, and significantly lowers the correlation between class SGP and prior achievement.  相似文献   

3.
The search for new, authentic science assessments of what students know and can do is well under way. This has unearthed measures of students' hands-on performance in carrying out science investigations, and has been expanded to discover more or less direct measures of students' knowledge structures. One potential finding is concept mapping, the focus of this review. A concept map is a graph consisting of nodes representing concepts and labeled lines denoting the relation between a pair of nodes. A student's concept map is interpreted as representing important aspects of the organization of concepts in his or her memory (cognitive structure). In this article we characterize a concept map used as an assessment tool as: (a) a task that elicits evidence bearing on a student's knowledge structure in a domain, (b) a format for the student's response, and (c) a scoring system by which the student's concept map can be evaluated accurately and consistently. Based on this definition, multiple concept-mapping techniques were found from the myriad of task, response format, and scoring system variations identified in the literature. Moreover, little attention has been paid to the reliability and validity of these variations. The review led us to arrive at the following conclusions: (a) an integrative working cognitive theory is needed to begin to limit this variation in concept-mapping techniques for assessment purposes; (b) before concept maps are used for assessment and before map scores are reported to teachers, students, the public, and policy makers, research needs to provide reliability and validity information on the effect of different mapping techniques; and (c) research on students' facility in using concept maps, on training techniques, and on the effect on teaching is needed if concept map assessments are to be used in classrooms and in large-scale accountability systems. © 1996 John Wiley & Sons, Inc.  相似文献   

4.
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and classification consistency and accuracy under three item response theory (IRT) frameworks: unidimensional IRT (UIRT), simple structure multidimensional IRT (SS-MIRT), and bifactor multidimensional IRT (BF-MIRT) models. Illustrative examples are presented using data from three mixed-format exams with various levels of format effects. In general, the two MIRT models produced similar results, while the UIRT model resulted in consistently lower estimates of reliability and classification consistency/accuracy indices compared to the MIRT models.  相似文献   

5.
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation methods explored in the current study include augmentation based on classical test theory and multidimensional item response theory (MIRT). The study shows that there is no estimation method that is optimal according to both criteria. Augmented subscores show the most improvement in reliability compared to observed subscores but are the least distinct.  相似文献   

6.

This study explored the effects of Roundhouse diagram construction on a previously low-performing middle school science student's struggles to understand abstract science concepts and principles. It is based on a metacognition-based visual learning model proposed by Wandersee in 1994. Ward and Wandersee introduced the Roundhouse diagram strategy and showed how it could be applied in science education. This article aims at elucidating the process by which Roundhouse diagramming helps learners bootstrap their current understandings to reach the intended meaningful understanding of complex science topics. The main findings of this study are that (a) it is crucial that relevant prior knowledge and dysfunctional alternative conceptions not be ignored during new learning if low-performing science students are to understand science well; (b) as the student's mastery of the Roundhouse diagram construction improved, so did science achievement; and (c) the student's apt choice of concept-related visual icons aided progress toward meaningful understanding of complex science concepts.  相似文献   

7.
In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity estimation consistent with this philosophy are suggested. Also, to improve the decision-making accuracy of criterion-referenced tests, a Bayesian procedure for estimating true mastery scores has been proposed. This Bayesian procedure uses information about other members of a student's group (collateral information), but the resulting estimation is still criterion referenced rather than norm referenced in that the student is compared to a standard rather than to other students. In theory, the Bayesian procedure increases the “effective length” of the test by improving the reliability, the validity, and more importantly, the decision-making accuracy of the criterion-referenced test scores.  相似文献   

8.
Science classes should support students' development of scientific argumentation. While previous studies have analyzed argumentative texts, they have overlooked the ways in which other types of representations, including images, affect the production of such texts. In addition, studies into the use of visual images in science education have offered mostly qualitative analyses. To fill these gaps in the research, this study used techniques of automated image processing to extract relevant information from student-generated visual artifacts. Specifically, it used a series of image-processing algorithms to automatically extract and quantify features of images created by students to serve as evidence in support of scientific arguments. Using various statistical analyses, we identified the relationships between the extracted features and the students' performance levels in constructing scientific arguments. The results revealed that the presence of water in a student's image correlated significantly with that student's claim and explanation scores and that the amount of water present in a student's image correlated significantly with that student's claim score, but not with their explanation score. These results indicate that automatic image processing can successfully identify image features that affect students' performance in scientific argumentation. Using this analysis as an example, we discuss implications for incorporating automated image processing into further research into scientific argumentation and the development of automated feedback.  相似文献   

9.
This paper provides tables of critical values for determining statistically significant discrepancies between Wechsler Verbal/Performance IQ and WIAT subtest and composite scores based on a predicted-achievement method. It is recommended that these tables be used when a statistically significant and diagnostically meaningful Verbal IQ-Performance IQ discrepancy exists rendering either of these IQs a better estimate of a student's ability than the Full Scale IQ. Issues regarding the use of discrepancy formulas in the assessment and diagnosis of learning disabilities are discussed, and basic considerations for using the critical values tables are provided.  相似文献   

10.
In educational assessment, overall scores obtained by simply averaging a number of domain scores are sometimes reported. However, simply averaging the domain scores ignores the fact that different domains have different score points, that scores from those domains are related, and that at different score points the relationship between overall score and domain score may be different. To report reliable and valid overall scores and domain scores, I investigated the performance of four methods using both real and simulation data: (a) the unidimensional IRT model; (b) the higher-order IRT model, which simultaneously estimates the overall ability and domain abilities; (c) the multidimensional IRT (MIRT) model, which estimates domain abilities and uses the maximum information method to obtain the overall ability; and (d) the bifactor general model. My findings suggest that the MIRT model not only provides reliable domain scores, but also produces reliable overall scores. The overall score from the MIRT maximum information method has the smallest standard error of measurement. In addition, unlike the other models, there is no linear relationship assumed between overall score and domain scores. Recommendations for sizes of correlations between domains and the number of items needed for reporting purposes are provided.  相似文献   

11.
12.
Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers with low prior achieving students have MGPs that underestimate true teacher performance and vice versa for teachers with high achieving students. We evaluate alternative MGP estimators, showing that aggregates of SGP that correct for ME only contain errors independent of prior achievement. The alternatives are thus more fair because they are not biased by prior mean achievement and have smaller overall variance and larger marginal reliability than the Standard MGP approach. In addition, the mean estimators always outperform their median counterparts.  相似文献   

13.
Two prior studies showed that giving teachers more information about a student's illness led them to make better attributions about that student's classroom problems and better classroom accommodations. In this study, 235 teachers appraised academic competence and judged whether to seek help or make a referral for a hypothetical student with type 1 diabetes mellitus (T1DM). Teachers received one of five levels comprising increasing disease disclosure and classroom‐relevant information about T1DM. Contrary to prior studies, teachers in this study who were given a student's T1DM diagnosis and details about T1DM's classroom risks failed to make better judgments about the student's academic skill levels or to award more accurate grades. Instead, teachers seemed swayed by this student's apparently careless and inconsistent schoolwork, which was presumably disease related. Likewise, better‐informed teachers were no better at selecting accommodations. However, once it was disclosed that the hypothetical student had T1DM, most teachers seemed knowledgeable about the most appropriate potential Individuals With Disabilities Education Improvement Act category for service delivery. Regarding practice issues, school psychologists were rarely selected as a first choice for consultation, and the more information teachers were provided with about T1DM and the student's disease status, the less likely they were to select a school psychologist as a consultant.  相似文献   

14.
In this paper, a method for analyzing data from student evaluations of teaching is presented. The first step of the process requires development of a regression model for teacher's summary rating as a function of student's expected grade. Then, two‐sigma control charts for individual evaluation scores (section averages) and residuals from the regression model are used to identify both excellent and poor outcomes. The performance of an individual whose scores are out of control on both charts cannot be explained by expected grade and therefore is worthy of note.  相似文献   

15.
This study investigates how scaffolding type and learners’ epistemological beliefs influence ill-structured problem solving. The independent variables in this study include the type of scaffolding (task-supported, self-monitoring) and the student's epistemological belief level (more advanced, less advanced). The dependent variables include three components of problem-solving skill (problem representation, solution development, monitoring and evaluation). The two-way multivariate analysis of variance results reveal that students in the self-monitoring scaffolding group earned higher scores on problem representation and solution development than those in the task-supported scaffolding group. Students with more advanced epistemological beliefs also earned higher scores on solution development and monitoring and evaluation than did those with less advanced epistemological beliefs. In addition, a significant interaction was found between scaffolding type and epistemological belief level. These findings suggest that students can benefit from self-monitoring scaffolding in web-based problem solving and that different types of scaffolding should be provided according to the student's epistemological belief level.  相似文献   

16.
The authors explored the credibility of using informal reading inventories and writing samples for 138 students (K–4) to evaluate the effectiveness of a summer literacy program. Running Records (a measure of a child's reading level) and teacher experience during daily reading instruction were used to estimate the reliability of the more formal Developmental Reading Assessment scores. Training of scorers was used to increase the reliability of writing scores; a second scoring was used to estimate the reliability of the scores. The results suggested that with minimal modifications to administration and scoring procedures, scores from both reading inventories and writing samples can be a dependable source of data for teachers, administrators, and policy makers. This result is significant because it suggests that formative literacy assessments can be reliably used instead of standardized multiple-choice tests to make more credible summative decisions without taking time away from instruction, and can truly match curriculum, instruction, and assessment.  相似文献   

17.
18.
《Educational Assessment》2013,18(4):255-258
Editor's Introduction. Reliability Versus Accuracy: A Critical Distinction Test reliability coefficients traditionally have been used to judge the quality of measurement. And, reliability coefficients of .90 have often been considered adequate to assure the quality for standardized testing and large-scale assessment programs. However, a test reliability of .90 (or above) does not ensure that individual test scores, such as national percentile ranks, are accurate. Consider, for example, a mathematics test with a reliability of .90 and imagine a student taking that test whose true score is at the 50th percentile; that is, we know that the student's actual capability is at that level. The probability is less than one third (.309) that when the student takes the test, he or she will obtain a score within 5 percentile points of his or her true score, the 50th percentile (Rogosa 1999a, 1999b). The following informal example attempts to explain why high test reliability does not indicate good accuracy for an individual score, without the encumbrances of percentile rank scoring, complex measurement models, and other technical detail. Dedicated to Al Bundy-A man who cares as much about good measurement as he does about his own children.  相似文献   

19.
20.
English language learners (ELLs) are the fastest growing subgroup in American schools. These students, by a provision in the reauthorization of the Elementary and Secondary Education Act, are to be supported in their quest for language proficiency through the creation of systems that more effectively measure ELLs’ progress across years. In the past, ELLs’ progress has been based on students’ prior scores measuring the same construct. To disentangle effectiveness from achievement, the reporting has generally targeted mean-group activity. In contrast, student growth percentiles (SGPs) provide a comparison of students’ growth with others who have the same achievement score history. By examining the construct measured by an English language proficiency test as manifested in student scores in Speaking, Listening, Reading and Writing, this article outlines the use of SGPs in providing information on how much each student needs to grow, which will allow educators to more effectively apply differential formative instructional strategies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号