首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Effective intervention delivery requires ongoing assessment to determine whether students are learning at the desired rate. Intervention programs with embedded assessment procedures (i.e., assessment that occurs naturally during the process of delivering intervention) can potentially enhance instructional decisions. However, there is almost no psychometric research on this type of assessment procedure. This study was designed to examine the psychometric characteristics of three types of progress measures that are embedded within a commonly used reading intervention program. Results indicated that generalized gains across different oral reading fluency passages predict concurrent gains on common and comprehensive tests of reading fluency, and that immediate instructional gains measured during instruction were significantly different from zero and thus sensitive to intervention effects. Overall findings suggest that at least some embedded assessment procedures demonstrate predictive validity and that these types of procedures have the potential to assist educators with data‐driven instructional decisions about students’ responsiveness to intervention.  相似文献   

2.
Previous research has demonstrated that cognitive test validities are generalizable and predictive of academic performance across situations. However, even after accounting for statistical artifacts (e.g., sampling error, range restriction, criterion reliability), substantial variability often remains around estimates of cognitive test–performance relationships suggesting the presence of additional moderators. In the present study, we examine the sources of institutional variation in Scholastic Aptitude Test (SAT) validity across a sample of 110 institutions. Institutional characteristics moderated the size of SAT validities, such that more selective schools and schools that emphasize traditional assessment techniques (i.e., school records, standardized tests) showed higher SAT validities while schools that were larger and where students demonstrated more financial need, schools that emphasized the usage of alternative assessment techniques (i.e., essays, letters of recommendations, extracurricular activities), and schools that enrolled higher percentages of historically disadvantaged minority students generally exhibited lower SAT validities. Future directions in the understanding of situational influences on SAT–grade point average validities are discussed.  相似文献   

3.
Previous research has established that SAT scores and high school grade point average (HSGPA) differ in their predictive power and in the size of mean differences across racial/ethnic groups. However, the SAT is scaled nationally across all test takers while HSGPA is scaled locally within a school. In this study, the researchers propose that this difference in how SAT scores and HSGPA are scaled partially explains differences in validity and subgroup differences. Using a large data set consisting of 170,390 students each of whom matriculated at one of 114 separate colleges, the researchers find that awarding SAT scores by ranking SAT within a high school generally results in substantial reduction in the size of subgroup mean differences for this predictor. However, validity for predicting first‐year GPA is also reduced by a small amount. Conversely, placing HSGPA onto a nationally normed metric through the use of multiple regression procedures results in a moderate increase in the size of subgroup mean differences, while also producing a small increase in validity. Taken together, these findings suggest that differences in predictor scaling can partially explain differences in the size of subgroup mean differences between HSGPA and SAT scores and have implications for predictive power.  相似文献   

4.
5.
For a number of reasons the learning test approach (also known under the expressions «dynamic assessment» or «assessment of learning potential») can be considered as a promising alternative and a complement to conventional procedures of intelligence testing. But learning test procedures have also been criticized for their lack of psychometric standardization, their time-consuming application and for their only modest increase in predictive validity compared to conventional intelligence tests with regard to normal populations. The author traces the history of the learning test concept and critically discusses its main theoretical problems and practical implications. He further presents an overview of the different learning test procedures that have been developed in his research group since the early seventies. One of the main goals of this research is to combine the learning test approach with psychometric standardization i.e. to maximize individualization of the assessment and pursue the goal of a more qualitative evaluation without losing the necessary objectivity or the possibility to quantify results in order to make interindividual comparisons. Other major concerns are to enhance content validity of learning tests by using better analyzed items and to invest more into research regarding construct validity.  相似文献   

6.
Analysis of children's spoken narratives represents a potentially informative approach to language assessment within early childhood settings. Yet, narrative assessment is not readily amenable to at-scale use given the time needed to collect, transcribe, and analyze a child's narrative sample and the lack of consensus regarding what aspects of narrative expression ought to be examined (e.g., language form, language content). The purpose of this study was to describe a direct assessment of children's language abilities within a narrative context, the Narrative Assessment Protocol (NAP), which examines five aspects of language: sentence structure, phrase structure, modifiers, nouns, and verbs. In this study, we present findings regarding internal consistency, test–retest reliability, construct validity, and the concurrent and predictive validity of the NAP. NAP scores from 262 3–5-year-old children participating in preschool programs were assessed for these purposes. Findings indicated that the NAP exhibits reasonable psychometric properties across the areas addressed, to include significant concurrent and predictive relations with a norm-referenced measure of general language ability. Although more research is needed, preliminary findings indicate that the NAP provides professionals with a valid and informative assessment approach for examining children's language skills within a narrative context; such information may be useful for establishing and monitoring children's language growth within preschool programs or language interventions.  相似文献   

7.
Many research questions require the assessment of reading in large samples of children. We compared two nonconventional procedures for early reading assessment: test administration by telephone, and teacher assessment. Five thousand five hundred and forty four children participating in a longitudinal twin study were assessed by telephone using the Test of Early Word Reading Efficiency, and by their teachers using the UK National Curriculum criteria. A correlation of 0.69 was obtained between the TOWRE and Teacher Assessment for Reading. There was also good agreement between the two procedures for identification of children at the lower extreme. Both appear to be practical and valid for research studies.  相似文献   

8.
Early childhood screening has been a widespread yet controversial practice. Serious concerns have been voiced in the literature about the technical limitations and the inappropriate uses of frequently used screens. Because developmental screening is a requirement set forth by Head Start’s performance standards, there is a need for studies to provide accuracy estimates for the Head Start population on commonly used screens. In response, this study examined sex and age differences in performance as well as reliability and validity indices for a sample of 256 Head Start children who were screened with the Brigance K&1 Screen. Children’s performance on the screen varied by age and sex. While the overall consistency of the test was high, there was considerable variability across subscales. Construct validation of the screen, based on correlations with the K-ABC cognitive battery, yielded moderate coefficients. The screen’s predictive validity was established using correlational and classification analyses. At the end of Head Start, moderate to moderately high validity coefficients were obtained when the Brigance was correlated with teachers’ ratings and with subtests of the K-ABC achievement battery. In addition, the Brigance correlated moderately with the PPVT-R and with several Woodcock-Johnson subtests at the beginning of kindergarten. Classification analyses established that the Brigance had less than optimal accuracy in predicting early school achievement and poor success in predicting assignment to special education at the end of kindergarten.  相似文献   

9.
This article reviews ten predictive validity studies of the Swedish Scholastic Assessment Test (SweSAT). A primary result is that the predictive validity of the SweSAT seems to be highly dependent upon the study programme being examined; that is, the predictive validity is better at some programmes than others. When compared with the upper‐secondary school grade point average, the predictive validity of the SweSAT seems to be fairly good, but there are major differences between study programmes in this case as well. However, it is suggested that the validity of the results is to some extent threatened by methodological issues. A general conclusion is, therefore, that there is room for improving the test itself, as well as the way that predictive validity studies are carried out.  相似文献   

10.
An achievement test score can be viewed as a joint function of skill and will, of knowledge and motivation. However, when interpreting and using test scores, the ‘will’ part is not always acknowledged and scores are mostly interpreted and used as pure measures of student knowledge. This paper argues that students’ motivation to do their best on the assessment – their test‐taking motivation – is important to consider from an assessment validity perspective. This is true not least in assessment contexts where the assessment outcome has no consequences for the test‐taker. The paper further argues that the quality of assessment of test‐taking motivation also needs attention. Theoretical and methodological issues related to the assessment of test‐taking motivation are presented from a validity perspective, and findings from empirical studies on the relation between test stakes, test‐taking motivation and test performance are presented.  相似文献   

11.
This study analyzed the relationship between benchmark scores from two curriculum‐based measurement probes in mathematics (M‐CBM) and student performance on a state‐mandated high‐stakes test. Participants were 298 students enrolled in grades 7 and 8 in a rural southeastern school. Specifically, we calculated the criterion‐related and predictive validity of benchmark scores from CBM probes measuring math computation and math reasoning skills. Results of this study suggest that math reasoning probes have strong concurrent and predictive validity. The study also provides evidence that calculation skills, while important, do not have strong predictive strength at the secondary level when a state math assessment is the criterion. When reading comprehension skill is taken into account, math reasoning scores explained the greatest amount of variance in the criterion measure. Computation scores explained less than 5% of the variance in the high‐stakes test, suggesting that it may have limitations as a universal screening measure for secondary students.  相似文献   

12.
This case-study investigates the predictive validity and reliability of Key Stage 2 test results, and teacher assessments, for target-setting and value-added assumptions at Key Stage 3. (In England Key Stage 2 tests are taken in the core subjects of English, Mathematics and Science at the age of 11. Key Stage 3 tests are taken in the same subjects at the age of 14. Teacher assessments are also completed for these subjects at both key stages.) The study employed the type of linear regression analysis recommended in several government reports, to correlate Key Stage 2 test results, and teacher assessments, in core subjects, with Key Stage 3 test results, and teacher assessments, in both core and non-core subjects. Following government recommendations that the use of any other form of testing - such as the National Foundation for Educational Research (NFER) Cognitive Abilities Test (CAT) - was now no longer necessary to provide baseline data for value-added calculations, or to set targets, correlations were also investigated between results on the CAT, and test results and teacher assessments at Key Stage 3, for both core and non-core subjects, to see whether this recommendation was well founded. The results of the case-study suggest that Key Stage 2 data, both in the form of test results and teacher assessments, have little or no predictive validity, or reliability, for test results or teacher assessments at Key Stage 3. Indeed, the predictive validity for non-core subjects at Key Stage 3 was so low as to be negligible. However, the CAT average score correlated more highly with both teacher assessments and test results at Key Stage 3 in core subjects, although this relationship was not reflected in non-core subjects. These findings suggest that the predictive validity and reliability of Key Stage 2 data is seriously open to question as baseline data for either value-added, or target-setting procedures, at Key Stage 3. It should be pointed out, however, that these findings are provisional, since they are based on data from two intake years, but preliminary analysis of data from a further three intake years appears to indicate that the concerns identified are well founded.  相似文献   

13.
The QUASAR Cognitive Assessment Instrument (QCAI) is designed to measure program outcomes and growth in mathematics. It consists of a relatively large set of open-ended tasks that assess mathematical problem solving, reasoning, and communication at the middle-school grade levels. This study provides some evidence for the generalizability and validity of the assessment. The results from the generalizability studies indicate that the error due to raters is minimal, whereas there is considerable differential student performance across tasks. The dependability of grade level scores for absolute decision making is encouraging; when the number of students is equal to 350, the coefficients are between .80 and .97 depending on the form and grade level. As expected, there tended to be a higher relationship between the QCAI scores and both the problem solving and conceptual subtest scores from a mathematics achievement multiple-choice test than between the QCAI scores and the mathematics computation subtest scores.  相似文献   

14.
Although there is considerable evidence that the Law School Admission Test (LSAT) and the undergraduate grade-point average (UGPA) have a useful degree of predictive validity, there is also a large variation in the magnitude of the coefficients across schools. Understanding this variation has important implications for the use and interpretation of results of a validity study conducted at an individual school. A meta analysis of the validity results and data on applicants to 154 law schools was conducted in an effort to better understand this observed variation. The standard deviation (SD) on the LSAT and the correlation between the LSAT and UGPA for accepted students at each law school accounted for 58.5% of the between-school variance in the multiple correlations of these two predictors with first-year average grade in law school. Sampling error accounted for an additional 12% of the variance. Hence, only a small fraction of the between-school variability in validities remains to be explained by other statistical artifacts of situational specificity factors. Mean validities and 90% credibility values for four adjustment procedures are reported as are the mean observed validities for different combinations of predictors.  相似文献   

15.
Abstract

This study investigated the reliability, validity, and utility of the following three measures of letter-formation quality: (a) a holistic rating system, in which examiners rated letters on a five-point Likert-type scale; (h) a holistic rating system with model letters, in which examiners used model letters that exemplified specific criterion scores to rate letters; and (c) a correct/incorrect procedure, in which examiners used transparent overlays and standard verbal criteria to score letters. Intrarater and interrater reliability coefficients revealed that the two holistic scoring procedures were unreliable, whereas scores obtained by examiners who used the correct/incorrect procedure were consistent over time and across examiners. Although all three of the target measures were sensitive to differences between individual letters, only the scores from the two holistic procedures were associated with other indices of handwriting performance. Furthermore, for each of the target measures, variability in scores was, for the most part, not attributable to the level of experience or sex of the respondents. Findings are discussed with respect to criteria for validating an assessment instrument.  相似文献   

16.
Growth in the use of testing to determine student eligibility for community college courses has prompted debate and litigation regarding over the equity, access, and legal implications of these practices. In California, this has resulted in state regulations requiring that community colleges provide predictive validity evidence of test-score?based inferences and course prerequisites. In addition, companion measures that supplement placement test scores must be used for placement purposes. However, for both theoretical and technical reasons the predictive validity coefficients between placement test scores and final grades or retention in a course generally demonstrate a weak relationship. The study discussed in this article examined the predictive validity of placement test scores with course grade and retention in English and mathematics courses. The investigation produced a model to explain variance in course outcomes using test scores, student background data, and instructor differences in grading practices. The model produced suggests that student dispositional characteristics explain the high proportion of variance in the dependent variables. Including instructor grading practices in the model adds significantly to the explanatory power and suggests that grading variations make accurate placement more problematic. This investigation underscores the importance of academic standards as something imposed on students by an institution and not something determined by the entering abilities of students.  相似文献   

17.
Recommendations from multiple professional organizations (e.g., American Psychological Association, Council for Exceptional Children, National Association of School Psychologists) suggest that collection of data on the social validity in practice and research is necessary. The purpose of this study was to systematically review the inclusion of acceptability measurement, which has been one of the most common way to measure social validity, within the intervention literature published across five school psychology journals between 2005 and 2017. Findings suggested just over one third of intervention studies included acceptability assessment. Intervention studies that were delivered individually, targeted behavior skills, and included treatment integrity data were significantly more likely to include acceptability assessment. When acceptability was measured it was typically evaluated one-time following treatment completion using self-report tools completed by teachers. Nearly half of studies employed one of seven published tools and the remaining half used researcher-created measures. The published tools were adapted in a variety of ways and inconsistently reported either item or total scores making it difficult to summarize these data according to intervention target or delivery format. Implications of findings are described.  相似文献   

18.
Decision-making in interdisciplinary treatment teams   总被引:1,自引:0,他引:1  
Interdisciplinary teams for the treatment of child abuse and neglect are becoming more common. Studies have shown that decisions made by groups who have had the opportunity to discuss their perspectives are more accurate than judgments made by individuals. Why this may be true is not clear. The purpose of the present study was to discover the procedures an interdisciplinary treatment team uses in making decisions. A single interdisciplinary incest treatment team was observed over a 15-month period. Open-ended interviews with team members also were conducted. Findings show that the interdisciplinary treatment team made its decisions using procedures analogous to procedures used in social research to establish reliability and validity. The decision-making process of the team was characterized by multiple observations of family members by multiple observers in multiple settings over time. This decision-making process is similar to processes used by many other treatment teams. The findings of the present research, then, are likely to be generalizable to other teams whose decision-making processes are similar.  相似文献   

19.
Teaching practices are pivotal for student learning. Due to pedagogical traditions and national cultures, the structure of teaching practices may differ across countries. This study investigates the structure of teaching practices across 12 countries grouped into four major linguistic/cultural clusters. First, factor analysis is applied to investigate if the theoretical distinction between teacher-directed and student-centred practices is generalizable across countries. Then, network analysis is used to explore how individual classroom assessment practices relate to either teacher-directed or student-centred practices. Main findings include that: (1) teacher-directed and student-centred practices are two distinct factors across countries; (2) the overall structure and connectivity of teaching practices differs across countries, with smaller differences within linguistic/cultural clusters; and (3) assessment practices with the aim to structure and guide learning strongly relate to teacher-directed practices, whereas assessment practices with the aim to individualize instruction more relate to student-centred practices. We discuss the global patterning and implications.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号