首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although a few studies report sizable score gains for examinees who repeat performance‐based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single‐take examinees and 4,030 repeat examinees who completed a 6‐hour clinical skills assessment required for physician licensure. Each examinee was rated in four skill domains: data gathering, communication‐interpersonal skills, spoken English proficiency, and documentation proficiency. Conditional standard errors of measurement computed for single‐take and multiple‐take examinees indicated that ratings were of comparable precision for the two groups within each of the four skill domains; however, conditional errors were larger for low‐scoring examinees regardless of retest status. In addition, on their first attempt multiple‐take examinees exhibited less score consistency across the skill domains but on their second attempt their scores became more consistent. Further, the median correlation between scores on the four clinical skill domains and three external measures was .15 for multiple‐take examinees on their first attempt but increased to .27 for their second attempt, a value, which was comparable to the median correlation of .26 for single‐take examinees. The findings support the validity of inferences based on scores from the second attempt.  相似文献   

2.
Teachers' conceptions of assessment can be understood in terms of their agreement or disagreement with four purposes to which assessment may be put, specifically, (a) improvement of teaching and learning, (b) school accountability, (c) student accountability, or (d) treating assessment as irrelevant. A 50‐item Teachers' Conceptions of Assessment (COA‐III) questionnaire was completed by New Zealand primary school teachers and managers (n=525). The COA‐III, based on the four main purpose‐defined conceptions of assessment, was analysed with structural equation modelling and showed a close fit of the data to a hierarchical, multi‐dimensional model (χ2=3217.68; df=1162; RMSEA=.058; TLI=.967). On average, participants agreed with the improvement conceptions and the school accountability conception, while rejecting the view that assessment was irrelevant. However, respondents disagreed that assessment was for student accountability. Improvement, school, and student accountability conceptions were positively correlated. The irrelevance conception was inversely related to the improvement conception and not related to the system accountability conception. Surprisingly, no statistically significant differences were found in mean scale scores for each conception regardless of teacher (age, gender, role, assessment training, or assessment practices) or school (size, location, or socio‐economic status) variables. Implications for the use of the COA‐III for policy implementation and teacher professional development are discussed.  相似文献   

3.
Recently there has been a great amount of research and professional educator interest in at-risk, poor academically attaining students, especially low socioeconomic status students at U.S. inner-city schools. A major factor that has been hypothesized in the research literature as being associated with poor academic attainment is the lack of critical and timely instructional feedback or formative evaluation. Using a sample of 130 inner-city senior high school students, the perceived quality and quantity of formative evaluation received by these students at their elementary and secondary school levels were assessed. in addition, each student was given a mathematics (pre-algebra) assessment using both a one and two-dimensional format (recognition plus confidence) to determine present levels of mathematics attainment. Finally data were collected from the cumulative grade-level folders of a subset of these students, especially norm-referenced data (NRT) in mathematics, to examine their relationship to scores on the Scholastic Aptitude Test-Quantitiative portion. The study finds that in addition to extremely poor mathematics attainment and poor formative evaluation practices there is little association between SAT (quantitative) scores and the grade-level (mathematics) NRT scores. These findings suggest that parents cannot depend on traditional norm-referenced measures to indicate actual mathematics attainment as these students are progressing through the schools. These findings also challenge urban school administrative personnel to reassess the use of NRT measures to monitor student progress and to develop more comprehensive and systematic formative evaluation procedures and practices for individual students as they progress through each grade level.  相似文献   

4.
The process of setting and evaluating student learning objectives (SLOs) has become increasingly popular as an example where classroom assessment is intended to fulfill the dual purpose use of informing instruction and holding teachers accountable. A concern is that the high‐stakes purpose may lead to distortions in the inferences about students and teachers that SLOs can support. This concern is explored in the present study by contrasting student SLO scores in a large urban school district to performance on a common objective external criterion. This external criterion is used to evaluate the extent to which student growth scores appear to be inflated. Using 2 years of data, growth comparisons are also made at the teacher level for teachers who submit SLOs and have students that take the state‐administered large‐scale assessment. Although they do show similar relationships with demographic covariates and have the same degree of stability across years, the two different measures of growth are weakly correlated.  相似文献   

5.
The present study investigated the relationship between the Revised Peabody Picture Vocabulary Test (PPVT-R) and the WISC-R for a naturally occurring sample of rural children referred for assessment (N = 53). The results indicated that the PPVT-R was highly correlated with WISC-R scale and subtest scores. Examination of a sub-sample of developmentally handicapped students revealed substantial reduction in correlational relationships as a function of reduced sample size and restricted range of general ability. While the PPVT-R was found to underestimate all three WISC-R scale scores, the discrepancy between the PPVT-R standard scores and the WISC-R Performance Scale score was the only statistically significant underestimation. Results are discussed in terms of prior research findings and implications for interpretation.  相似文献   

6.
In a study using the literacy module from CoPS Baseline, a computerised assessment system, 153 children were assessed at an average age of 4 years 10 months, and progress in reading was followed up 12 months later. The results indicated that the computerised baseline assessment module produced a satisfactory distribution of scores across the intended age range, and the shorter adaptive form of the baseline test correlated highly (r = 0.81) with the full form. Baseline scores gave a good overall prediction of reading development over the first year of schooling (r = 0.74), regardless of the child’s age. Correlations between the 8 skill/concept areas that comprise the baseline assessment and reading ability 12 months later were consistent with other findings reported in the literature. It was concluded that the objectivity that characterises computerised assessment could provide a more consistent and dependable approach to baseline assessment.  相似文献   

7.
《教育实用测度》2013,26(3):185-207
With increasing interest in educational accountability, test results are now expected to meet a diverse set of informational needs. But a norm-referenced test (NRT) cannot be expected to meet the simultaneous demands for both norm-referenced and curriculum-specific information. One possible solution, which is the focus of this article, is to customize the NRT. Customized tests may appear in any form. They may (a) add a few curriculum-specific items to the end of the NRT, (b) substitute locally constructed items for a few NRT items, (c) substitute a curriculum-specific test (CST) for the NRT, or (d) use equating methods to obtain predicted NRT scores from the CST scores. In this article, we describe the four main approaches to customized testing, address the validity of the uses and interpretations of customized test scores obtained from the four main approaches, and offer recommendations regarding the use of customized tests and the need for further research. Results indicate that customized testing can yield both valid normative and curriculum- specific information, when special conditions exist. But, there are also many threats to the validity of normative interpretations. Cautious application of customized testing is needed in order to avoid misleading inferences about student achievement.  相似文献   

8.
ABSTRACT

This study focused on school effectiveness in terms of changes in the distribution of achievement across socio‐economic status (SES) for a cohort of 9700 students as they progressed from grade 1 to grade 3. The achievement/SES link was operationalized as the standardized mean difference in achievement (EFFSIZE) between samples of high and low SES students for 165 schools. EFFSIZE measures in reading and mathematics were analyzed in an hierarchical linear model which allowed an assessment of the impact of 13 school characteristics on initial (grade 1) status and, more importantly, on trends over time. The results indicate that, on the average, low SES students are at an initial disadvantage relative to their high SES peers in both subjects and that the difference widens across the first three grades. True school variance in slopes was found in mathematics but not reading, a result consistent with previous research. The school characteristics, which included six indicators based on the effective schools literature, were found to be ineffective predictors of these growth parameters.  相似文献   

9.
In the United Kingdom, the majority of national assessments involve human raters. The processes by which raters determine the scores to award are central to the assessment process and affect the extent to which valid inferences can be made from assessment outcomes. Thus, understanding rater cognition has become a growing area of research in the United Kingdom. This study investigated rater cognition in the context of the assessment of school‐based project work for high‐stakes purposes. Thirteen teachers across three subjects were asked to “think aloud” whilst scoring example projects. Teachers also completed an internal standardization exercise. Nine professional raters across the same three subjects standardized a set of project scores whilst thinking aloud. The behaviors and features attended to were coded. The data provided insights into aspects of rater cognition such as reading strategies, emotional and social influences, evaluations of features of student work (which aligned with scoring criteria), and how overall judgments are reached. The findings can be related to existing theories of judgment. Based on the evidence collected, the cognition of teacher raters did not appear to be substantially different from that of professional raters.  相似文献   

10.
Teachers' pedagogical content knowledge (PCK) is highly important for effective design and implementation of school teaching. Thus, the current status, development and efficacy of this knowledge, its relationships with teaching quality parameters, and its impact on students' learning processes and success, require rigorous examination. Thoroughly validated, objective and reliable test instruments that are highly sensitive to changes in variables of proven knowledge-related relevance in teacher education are also required. Previous attempts to design such instruments for assessing science teachers' PCK have largely focused on mathematical content. Therefore, here we present an instrument (the pedagogical content knowledge in biology inventory, PCK-IBI), based on conceptualizations of teachers' professional competence, for assessing secondary school pre-service biology teachers' PCK. In a series of three evaluations and refinements it was tested with samples of N = 274 and N = 432 German pre-service as well as one sample of n = 65 German pre-service and n = 35 German in-service biology teachers. Item analysis, scale analysis and empirically obtained indicators of validity suggest that the final 34-item-version of the PCK-IBI is unidimensional, provides objective test scores and enables reliable and valid registration of pre-service biology teachers' PCK. Thus, hypotheses regarding specific aspects of the model on which the PCK-IBI's construction is based on are empirically supported. The results of our study provide empirical support for the instrument's potential utility.  相似文献   

11.
This paper considers cognitive differences between twins and other higher multiple births when they start school and their progress in reading and maths during the first year at school. The data came from the Performance Indicators in Primary Schools (PIPS) project which has developed an on‐entry assessment designed to provide a solid base against which the relative progress (value added) of pupils can be measured. Some of the findings were surprising, in that they appeared to be in conflict with some earlier work, but there was also some agreement. Additionally, teachers were asked to assess pupils on the Attention Deficit Hyperactivity Disorder (ADHD) scale at the end of their first year in school, and this information was used to check out an earlier rinding that twins were more prone to score highly on this scale.  相似文献   

12.
This study evaluates four growth prediction models—projection, student growth percentile, trajectory, and transition table—commonly used to forecast (and give schools credit for) middle school students' future proficiency. Analyses focused on vertically scaled summative mathematics assessments, and two performance standards conditions (high rigor and low rigor) were examined. Results suggest that, when “status plus growth” is the accountability metric a state uses to reward or sanction schools, growth prediction models offer value above and beyond status‐only accountability systems in most, but not all, circumstances. Predictive growth models offer little value beyond status‐only systems if the future target proficiency cut score is rigorous. Conversely, certain models (e.g., projection) provide substantial additional value when the future target cut score is relatively low. In general, growth prediction models' predictive value is limited by a lack of power to detect students who are truly on‐track. Limitations and policy implications are discussed, including the utility of growth projection models in assessment and accountability systems organized around ambitious college‐readiness goals.  相似文献   

13.
Portfolio assessment, that is, the evaluation of performance by means of a cumulative collection of student work, has figured prominently in recent US debate about education reform. Proponents hope not only to broaden measurement of performance, but also to use portfolio assessment to encourage improved instruction. Although portfolio assessment has sparked considerable attention and enthusiasm, it has been incorporated into only a few of the nearly ubiquitous large‐scale external assessment programmes in the US. This paper evaluates the quality of the performance data produced by several large‐scale portfolio efforts. Evaluations of reliability, which have focused primarily on the consistency of scoring, have yielded highly variable results. While high levels of consistency have been reached in some cases, scoring has been quite inconsistent in others, to the point of severely limiting the utility of scores.

Information about other aspects of validity is more limited and generally discouraging. For example, scores from portfolio assessments often do not show anticipated relationships with other achievement data, and teachers report practices in the implementation of portfolio assessment that are appropriate for instructional purposes but threaten the validity of inferences from portfolio scores. While other studies show positive effects of portfolio programmes (see Stecher, this issue), these findings suggest that portfolio assessment at its current state of development is problematic for many of the uses to which large‐scale external assessments are now put in the US.  相似文献   


14.
Insincere respondents can have an adverse impact on the validity of substantive inferences arising from self-administered questionnaires (SAQs). The current study introduces a new method for identifying potentially invalid respondents from their atypical response patterns. The two-step procedure involves generating a response inconsistency (RI) score for each participant and scale on the SAQ and subjecting the resulting scores to latent profile analysis to identify classes of atypical RI respondent profiles. The procedure can be implemented post–data collection and is illustrated through a survey of school climate that was administered to N = 52,102 high school students. Results of this screening procedure revealed high levels of specificity and expected levels of concordance when contrasted with the results of traditionally used methods of screening items and response time. Contrasts between valid and invalid respondents revealed similar patterns across the three screening procedures when compared across external measures of academics and risk behaviors.  相似文献   

15.
Students conceive of assessment in at least four major ways (i.e., assessment makes students accountable; assessment is irrelevant because it is bad or unfair; assessment improves the quality of learning; and assessment is enjoyable). A study in New Zealand of 3469 secondary school students’ conceptions of assessment used a self‐report inventory and scores from a standardised curriculum‐based assessment of reading comprehension. Four inter‐correlated conceptions based on 11 items were found with good psychometric properties. A path‐model linking the four correlated conceptions with student achievement in reading, while taking into account student ethnicity, student sex, and student year, had good psychometric properties. The conception that assessment makes students accountable loaded positively on achievement while the three other conceptions (i.e., assessment makes schools accountable, assessment is enjoyable, and assessment is ignored) had negative loadings on achievement. These findings are consistent with self‐regulation and formative assessment theories, such that students who conceive of assessment as a means of taking responsibility for their learning (i.e., assessment makes me accountable) will demonstrate increased educational outcomes.  相似文献   

16.
This study examined the concurrent validity of the composite and area scores of the Stanford-Binet Intelligence Scale: Fourth Edition (SBIV) and the Mental Processing Composite and global scale scores of the Kaufman Assessment Battery for Children (K-ABC). The tenability of interpreting the SBIV using the fluid/crystallized model, as suggested by the authors, was also considered. The subjects were 30 Black, learning-disabled elementary school students. Results of a t test indicated that the Mental Processing Composite score of the K-ABC was significantly higher than the SBIV Composite score. Moderate to high correlations were obtained when SBIV composite and area scores were compared to K-ABC composite and scale scores, reflecting a positive relationship between the two tests. The measures of fluid abilities (K-ABC Composite score; SBIV Abstract/Visual Reasoning) were highly correlated. The results of a multiple regression analysis indicated a moderate degree of correlation among the measures of crystallized ability (K-ABC Achievement; SBIV Verbal Reasoning and Quantitative Reasoning). The findings of this study demonstrated adequate concurrent validity for the SBIV. In addition, the results provided limited support for describing test results utilizing the fluid/crystallized interpretation model. Further research is suggested in order to examine other validity issues, such as classification of special education students and the SBIV's relationship to other similar instruments.  相似文献   

17.
This study investigates school effects on primary school students’ language and mathematics achievement trajectories in Chile, a context of particular interest given its large between-school variability in educational outcomes. The sample features an accelerated longitudinal design (3 time points, 4 cohorts) together spanning Grades 3 to 8 (n = 19,704 students in 156 schools). The magnitudes of school effects on students’ growth trajectories were found to be sizeable (generally larger than school effects in Western industrialised countries) and moderately consistent across school subjects. School composition effects on student achievement status were found for both school subjects. However, there was no evidence of composition effects on student achievement growth. The study provides new evidence on the size and nature of school effects in a developing country context based on state-of-the-art methods (i.e., accelerated longitudinal and growth curve models).  相似文献   

18.
A rationale is provided for hypothesizing that a counterpart of the social desirability variable influences environmental ratings based on student perceptions, and a test is made of the hypothesis. The High School Characteristics Index was administered to 2819 high school seniors from 11 high schools. Social desirability scale values for the 300 items and 30 scale scores of the HSCI were obtained from 85 students in Education, and these values were correlated with the endorsement percentages and average scale scores for the students in each of the 11 high schools. Results indicated an appreciable "desirability halo" effect for some student bodies, with wide differences among student bodies with respect to the strength and direction of that effect. The results are interpreted as a serious challenge to the validity and discriminative capability of environmental assessment techniques based on student perceptions.  相似文献   

19.
The National Science Education Standards emphasise teaching unifying concepts and processes such as basic functions of living organisms, the living environment, and scale. Scale influences science processes and phenomena across the domains. One of the big ideas of scale is that of surface area to volume. This study explored whether or not there is a correlation between proportional reasoning ability and a student's ability to understand surface area to volume relationships. Students' knowledge of surface area to volume relationships was assessed pre and post to a one‐week instructional intervention involving investigations about surface area to volume as a limiting factor in biological and physical systems. Results showed that proportional reasoning scores of middle school students were correlated to pre‐test and post‐test assessment scores, and a paired‐sample t‐test found significant differences from pre‐test to post‐test for the surface area to volume assessment. Relationships between proportional reasoning, visualisation abilities and success in solving surface to volume problems are discussed. The implications of the results of this study for learning concepts such as magnitudes of things, limits to size, and properties of systems that change depending on volume and surface are explored.  相似文献   

20.
Effectively presenting complex material is a crucial component of instructional design within simulation-based training (SBT) environments. One approach to facilitate the acquisition of higher-order knowledge is to embed instructional strategies within the systems themselves. Currently, however, there are few established guidelines to inform developers how best to implement such strategies. In response, this study aims to explore the presentation of one such strategy—feedback—during SBT of a complex decision-making task. Specifically, this study extends past research on the modality principle of multimedia learning by comparing the use of spoken- versus printed-text real-time feedback in an SBT environment. During two primarily visual training scenarios, participants received spoken-text (Spoken Group), printed-text (Printed Group), or no feedback (Control Group) based on their performance. Results indicated that the Spoken Group demonstrated greater decision-making performance during training and assessment compared to the Printed Group. These findings are consistent with those of past research and suggest that the modality principle can be extended to the presentation of real-time feedback during SBT of higher-order cognitive skills.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号