首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
What can be done at the local school level to study the validity of measures of critical thinking? What information other than test scores should be analyzed? What validity can a test have for diagnosis, monitoring achievement, or curriculum reform?  相似文献   

2.
What explanations have been provided for spurious test score gains? Are states and districts narrowing the curriculum and teaching the test? What effect does teaching the test have on the norms themselves? What alternatives must be sought to protect the integrity of instruction and the validity of scores?  相似文献   

3.

Objective

Analysis of the validity and implementation of a child maltreatment actuarial risk assessment model, the California Family Risk Assessment (CFRA).

Questions addressed

(1) Is there evidence of the validity of the CFRA under field operating conditions? (2) Do actuarial risk assessment results influence child welfare workers’ service delivery decisions? (3) How frequently are CFRA risk scores overridden by child welfare workers? (4) Is there any difference in the predictive validity of CFRA risk assessments and clinical risk assessments by child welfare workers?

Method

The study analyzes 7,685 child abuse/neglect reports originating in 5 California counties followed prospectively for 2 years to identify further substantiated child abuse/neglect. Measures of model calibration and discrimination were used to assess CFRA validity and compare its accuracy with the accuracy of clinical predictions made by child welfare workers. The extent of use of an override feature of the CFRA and child welfare worker reliance on CFRA risk scores for making service decisions were analyzed.

Results

Imperfect but better-than-chance predictive validity was found for the CFRA on a range of measures in a large temporal validation sample (n = 6,543). For 114 cases where both CFRA risk assessments and child welfare worker clinical risk assessments were available, the CFRA exhibited evidence of imperfect but better-than-chance predictive validity, while child welfare worker risk assessments were found to be invalid. Child welfare workers overrode CFRA risk assessments in only 114 (1.5%) of 7,685 cases and provided in-home services in statistically significantly larger proportions of higher- versus lower-risk cases, consistent with heavy reliance on the CFRA.

Conclusions/practice implications

Until research identifies actuarial models exhibiting superior predictive validity when applied in every-day practice, the CFRA is, and will be a valuable tool for assessing risk in order to make in-home service-provision decisions.  相似文献   

4.
Using generalizability (G-) theory, this study examined the accuracy and validity of the writing scores assigned to secondary school ESL students in the provincial English examinations in Canada. The major research question that guided this study was: Are there any differences between the accuracy and construct validity of the analytic scores assigned to ESL students and to NE students for the provincial English writing examination across three years? A series of G-studies and decision (D-) studies for three years were conducted to examine accuracy and validity issues. Results showed that differences in score accuracy did exist between ESL and NE students when initial (pre-adjudication) scores were used. The observed G-coefficients for ESL students were significantly lower than those for NE students in all three years, indicating that there were less accuracy and increased errors associated with the writing scores assigned to ESL students. Further, there were significantly less convergent validity in one year and less discriminant validity in all three years of the writing scores assigned to ESL students than to NE students. These findings raise a potential question about the presence of bias in the assessment of ESL students’ writing if initial scores were used.  相似文献   

5.
Extensive research has examined the validity and fairness of standardized tests in academic admissions. However, due to their underrepresentation in higher education, American Indians have gained much less attention in this research. In the present study, we examined for American Indian students (1) group differences on SAT scores, (2) the predictive and incremental validity of SAT over high school grades, (3) the effect of socioeconomic status on SAT validity, (4) differential prediction in the use of SAT scores, and (5) potential omitted variables that could explain differential prediction for American Indian students. Results provided evidence of predictive and incremental validity of SAT scores, and the validity of SAT scores was largely independent of socioeconomic status. Overprediction was found when using SAT scores to predict college performance and it was reduced when including high school grades as an additional predictor. This study provides substantial evidence of the validity and fairness of SAT scores for American Indians.  相似文献   

6.
If a test is intended to impact teaching and leaning, how can we make a case for its validity? Currently, it is argued, the case for the validity of a test considers only the test maker's view of what the scores mean and whether they are useful for teachers and learners. Although the test maker's perspective is a necessary one, it is insufficient to validate a test, so an expanded framework for validating tests is needed. The expansion proposes using teachers'/professionals' and students' perspectives as necessary information in the validation of what test scores mean and whether they are useful to teachers and learners.  相似文献   

7.
The attributes of self-direction in learning are becoming increasingly important as the need for lifelong learning grows in strength. Educators are challenged to assist in the development of self-directed learning skills and to encourage learners to more freely use self-direction in their learning activities. Unfortunately, there are few validated procedures for identifying the self-directed learners. Guglielmino's Self-Directed Learning Readiness Scale is one of the few instruments identified in the literature for the purpose of measuring self-direction in learning. Even though the scale has been widely used, additional validation is needed. This study was designed to use a multitrait-multimethod procedure for determining the validity of the SDLRS. The sample included 136 college students from two different colleges: 63 black students, 70 white students and 3 students of other nationalities (other than USA). Thirty-seven specific hypotheses were tested. Findings concerning selected hypotheses are discussed. Three general conclusions concerning the validity of the SDLRS are as follows: (1) The findings are supportive of the validity of the SDLRS; (2) Significant differences were noted in faculty ratings according to racial composition and student scores on the SDLRS; (3) Significant associations exist between the SDLRS scores and variables such as age, educational level and ARS (agreement response set).  相似文献   

8.
Growth in the use of testing to determine student eligibility for community college courses has prompted debate and litigation regarding over the equity, access, and legal implications of these practices. In California, this has resulted in state regulations requiring that community colleges provide predictive validity evidence of test-score?based inferences and course prerequisites. In addition, companion measures that supplement placement test scores must be used for placement purposes. However, for both theoretical and technical reasons the predictive validity coefficients between placement test scores and final grades or retention in a course generally demonstrate a weak relationship. The study discussed in this article examined the predictive validity of placement test scores with course grade and retention in English and mathematics courses. The investigation produced a model to explain variance in course outcomes using test scores, student background data, and instructor differences in grading practices. The model produced suggests that student dispositional characteristics explain the high proportion of variance in the dependent variables. Including instructor grading practices in the model adds significantly to the explanatory power and suggests that grading variations make accurate placement more problematic. This investigation underscores the importance of academic standards as something imposed on students by an institution and not something determined by the entering abilities of students.  相似文献   

9.
In 2018, 26 states administered a college admissions test to all public school juniors. Nearly half of those states proposed to use those scores as their academic achievement indicators for federal accountability under the Every Student Succeeds Act (ESSA); many others are planning to use those scores for other accountability purposes. Accountability encompasses a number of different uses and subsumes a variety of claims. For states proposing to use summative tests for accountability, a validity argument needs to be developed, which entails delineating each specific use of test scores associated with accountability, identifying appropriate evidence, and offering a rebuttal to counterclaims. The aim of this article is to support states in developing a validity argument for use of college admission test scores for accountability by identifying claims that are applicable across states, along with summarizing existing evidence as it relates to each of these claims. As outlined by The Standards for Educational and Psychological Testing, multiple sources of evidence are used to address each claim. A series of threats to the validity argument, including weaker alignment with content standards and potential influences in narrowing teaching, are reviewed. Finally, the article contrasts validity evidence, primarily from research on the ACT, with regulatory requirements from ESSA. The Standards and guidance addressing the use of a “nationally recognized high school academic assessment” (Elementary and Secondary Education Act (ESEA), Negotiated Rulemaking Committee; Department of Education) are the primary sources for the organization of validity evidence.  相似文献   

10.
This study describes the development and validation of a science and engineering (S/E) career interest survey (CIS). This 56 question survey was developed to measure the overall S/E career interests of 7th through 9th grade students. In the CIS, a S/E career is characterized as one which requires the completion of at least a four-year college program with a major in science, science education, or engineering. The CIS is divided into four major parts. In Part I (30 questions), students are expected to select from occupational activities, while in Part II (20 questions) they are to select from various occupations. Part III (5 questions) and Part IV together make up the CIS internal verification scale. The CIS test-retest reliability coefficients for one week and eight months were calculated as 0.96 (n = 57, grades 7–9) and 0.78 (n = 1937, grade 8), respectively. The KR-21 estimate for the CIS was calculated as 0.92. Criterion-related validity coefficients were calculated in two ways: (a) CIS scores were correlated with the Kuder GIS science subscale (r = 0.75, n = 45, grades 7–9), and (b) CIS scores were correlated with a CIS internal verification scale (r = 0.59, n = 127, grades 7–9). Evidence to support the construct validity of the CIS was collected by two methods: (a) for 7–9 grade students (n = 45), the CIS score was found to correlate 0.75 with the scientific subscale and ?0.42 with the artistic sub-scale, of the Kuder GIS. (b) the second method compared the scores of known groups. Test results for students in grades 7-9 (n = 127; n = 1937) showed a statistically significant difference between the scores of boys and girls on S/E career interest. The readability of the CIS was seventh grade level.  相似文献   

11.
《教育实用测度》2013,26(2):163-183
When low-stakes assessments are administered, the degree to which examinees give their best effort is often unclear, complicating the validity and interpretation of the resulting test scores. This study introduces a new method, based on item response time, for measuring examinee test-taking effort on computer-based test items. This measure, termed response time effort (RTE), is based on the hypothesis that when administered an item, unmotivated examinees will answer too quickly (i.e., before they have time to read and fully consider the item). Psychometric characteristics of RTE scores were empirically investigated and supportive evidence for score reliability and validity was found. Potential applications of RTE scores and their implications are discussed.  相似文献   

12.
This article reviews the intended uses of these college‐ and career‐readiness assessments with the goal of articulating an appropriate validity argument to support such uses. These assessments differ fundamentally from today's state assessments employed for state accountability. Current assessments are used to determine if students have mastered the knowledge and skills articulated in state standards; content standards, performance levels, and student impact often differ across states. College‐ and career‐readiness assessments will be used to determine if students are prepared to succeed in postsecondary education. Do students have a high probability of academic success in college or career‐training programs? As with admissions, placement, and selection tests, the primary interpretations that will be made from test scores concern future performance. Statistical evidence between test scores and performance in postsecondary education will become an important form of evidence. A validation argument should first define the construct (college and career readiness) and then define appropriate criterion measures. This article reviews alternative definitions and measures of college and career readiness and contrasts traditional standard‐setting methods with empirically based approaches to support a validation argument.  相似文献   

13.
Cindy L. James   《Assessing Writing》2006,11(3):167-178
How do scores from writing samples generated by computerized essay scorers compare to those generated by “untrained” human scorers and what combination of scores, if any, is more accurate at placing students in composition courses? This study endeavored to answer this two-part question by evaluating the correspondence between writing sample scores generated by the IntelliMetric™ automated scoring system and scores generated by University Preparation English faculty, as well as examining the predictive validity of both the automated and human scores. The results revealed significant correlations between the faculty scores and the IntelliMetric™ scores of the ACCUPLACEROnLine WritePlacer Plus test. Moreover, logistic regression models that utilized the IntelliMetric™ scores and average faculty scores were more accurate at placing students (77% overall correct placement rate) than were models incorporating only the average faculty score or the IntelliMetric™ scores.  相似文献   

14.
完形填空试题由于在命题、实施、评卷、结果分析等方面具有客观、便利等优点,因而被广泛应用于外语教学和测试中。但是目前充斥市场的绝大多数完形填空试题效度不高,主要原因就是试题的考点层次不高,效度偏低。根据李筱菊提出的完形填空考点层次理论设计一道完形填空试题,并选择某高校的学生进行试测,重点分析了答题正确率和失分原因,从实证的角度得出通过提高考点层次来提升完形填空试题考点效度的方法。应着重培养学生在高层次考点上的能力,从而提高英语学习者的综合英语水平。  相似文献   

15.
A misconception exists that validity may refer only to the interpretation of test scores and not to the uses of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test scores remains essential today. However, test scores are not interpreted and then ignored; rather, their interpretations lead to actions. Thus, a modern definition of validity needs to describe the validation of test score interpretations as a necessary, but insufficient, step en route to validating the uses of test scores for their intended purposes. To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests. The current definition of validity stipulated in the 2014 version of the Standards for Educational and Psychological Testing properly describes validity in terms of both interpretations and uses, and provides a sufficient starting point for validation.  相似文献   

16.
The purpose of this study was to examine the reliability and validity of the School Anxiety Inventory (SAI) using a sample of 646 Slovenian adolescents (48% boys), ranging in age from 12 to 19 years. Single confirmatory factor analyses replicated the correlated four‐factor structure of scores on the SAI for anxiety‐provoking school situations (Anxiety about School Failure and Punishment, Anxiety about Aggression, Anxiety about Social Evaluation, and Anxiety about Academic Evaluation), and the three‐factor structure of the anxiety response systems (Physiological Anxiety, Cognitive Anxiety, and Behavioral Anxiety). Equality of factor structures was compared using multigroup confirmatory factor analyses. Measurement invariance for the four‐ and three‐factor models was obtained across gender and school‐level samples. The scores of the instrument showed high internal reliability and adequate test–retest reliability. The concurrent validity of the SAI scores was also examined through its relationship with the Social Anxiety Scale for Adolescents (SASA) scores and the Questionnaire about Interpersonal Difficulties for Adolescents (QIDA) scores. Correlations of the SAI scores with scores on the SASA and the QIDA were of low to moderate effect sizes.  相似文献   

17.
The relationships between ratings on the Idaho Alternate Assessment (IAA) for 116 students with significant disabilities and corresponding ratings for the same students on two norm-referenced teacher rating scales were examined to gain evidence about the validity of resulting IAA scores. To contextualize these findings, another group of 54 students who had disabilities, but were not officially eligible for the alternate assessment also was assessed. Evidence to support the validity of the inferences about IAA scores was mixed, yet promising. Specifically, the relationship among the reading, language arts, and mathematics achievement level ratings on the IAA and the concurrent scores on the ACES-Academic Skills scales for the eligible students varied across grade clusters, but in general were moderate. These findings provided evidence that IAA scales measure skills indicative of the state's content standards. This point was further reinforced by moderate to high correlations between the IAA and Idaho State Achievement Test (ISAT) for the not eligible students. Additional evidence concerning the valid use of the IAA was provided by logistic regression results that the scores do an excellent job of differentiating students who were eligible from those not eligible to participate in an alternate assessment. The collective evidence for the validity of the IAA scores suggests it is a promising assessment for NCLB accountability of students with significant disabilities. The methods of establishing this evidence have the potential to advance validation efforts of other states' alternate assessments.  相似文献   

18.
Differential weighting of response alternatives and confidence testing have been proposed as ways to assess partial knowledge on multiple-choice tests. 211 students in an educational measurement course took their midterm examination under one of three procedures. Results from those students administered the test under conventional directions provided a baseline for comparing, in terms of reliability and validity, the results from students who took the test under the differential weighting of response alternatives or the confidence testing instructions. Reliability was estimated by the split-half technique. Validity was estimated by correlating midterm test scores with scores on a final examination. This investigation provides some support for the contention that validity can be improved using more sophisticated testing techniques. Suggestions for the conduct of more definitive studies were offered.  相似文献   

19.
The discriminant and concurrent validity of the Gordon Diagnostic System (GDS) was investigated in 29 youngsters categorized into “normals” or “ADHDs” based on teacher ratings. The results failed to demonstrate the discriminant validity of any GDS score regardless of the behavior rating used. The Vigilance Correct and Vigilance Omission scores were significantly correlated with ADHD Rating Scale scores completed by teachers. The sample size in the study demands cautious interpretation of these results; however, the authors suggest the continued use of multiple behavior ratings by teachers as the “gold standard” for the classification of youngsters with a suspected Attention-deficit Hyperactivity Disorder.  相似文献   

20.
What happens when philosophical and theoretical propositions meet the harsh realities of the nation's largest school district? How does consequential validity play out in the Big Apple?  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号