首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
In this study, we focused on increasing the reliability of ability-achievement difference scores using the Kaufman Assessment Battery for Children (KABC) as an example. Ability-achievement difference scores are often used as indicators of learning disabilities, but when they are derived from traditional equally weighted ability and achievement scores, they have suboptimal psychometric properties because of the high correlations between the scores. As an alternative to equally weighted difference scores, we examined an orthogonal reliable component analysis, (RCA) solution and an oblique principal component analysis (PCA) solution for the standardization sample of the KABC (among 5- to 12-year-olds). The components were easily identifiable as the simultaneous processing, sequential processing, and achievement constructs assessed by the KABC. As judged via the score intercorrelations, all three types of scores had adequate convergent validity, while the orthogonal RCA scores had superior discriminant validity, followed by the oblique PCA scores. Differences between the orthogonal RCA scores were more reliable than differences between the oblique PCA scores, which were in turn more reliable than differences between the traditional equally weighted scores. The increased reliability with which the KABC differences are assessed with the orthogonal RCA method has important practical implications, including narrower confidence intervals around difference scores used in individual administrations of the KABC.  相似文献   

2.
Previous research has established that SAT scores and high school grade point average (HSGPA) differ in their predictive power and in the size of mean differences across racial/ethnic groups. However, the SAT is scaled nationally across all test takers while HSGPA is scaled locally within a school. In this study, the researchers propose that this difference in how SAT scores and HSGPA are scaled partially explains differences in validity and subgroup differences. Using a large data set consisting of 170,390 students each of whom matriculated at one of 114 separate colleges, the researchers find that awarding SAT scores by ranking SAT within a high school generally results in substantial reduction in the size of subgroup mean differences for this predictor. However, validity for predicting first‐year GPA is also reduced by a small amount. Conversely, placing HSGPA onto a nationally normed metric through the use of multiple regression procedures results in a moderate increase in the size of subgroup mean differences, while also producing a small increase in validity. Taken together, these findings suggest that differences in predictor scaling can partially explain differences in the size of subgroup mean differences between HSGPA and SAT scores and have implications for predictive power.  相似文献   

3.
This study investigated two procedures for estimating the population standard deviation of nonnormed tests. Two normed tests, both whose population standard deviation was known, were administered to 272 students in grades 3–6. One of the normed tests was treated as a criterion-referenced test; the two variance estimation procedures were applied to the scores from this test. Substantial differences were found between both estimated statistics and the actual standard deviation. The first estimation procedure estimated the standard deviation systematically higher, whereas the second procedure's estimation was systematically lower. These results are discussed in terms of using such procedures for program evaluation.  相似文献   

4.
This study was an investigation of the relation between the reliability of difference scores, considered as a parameter characterizing a population of examinees, and the reliability estimates obtained from random samples from the population. The parameters in familiar equations for the reliability of difference scores were redefined in such a way that determinants of reliability in both populations and samples become more transparent. Computer simulation was used to find sample values and to plot frequency distributions of various correlations and variance ratios relevant to the reliability of differences. The shape of frequency distributions resulting from the simulations and the means and standard deviations of these distributions reveal the extent to which reliability estimates based on sample data can be expected to meaningfully represent population reliability.  相似文献   

5.
The relation between test reliability and statistical power has been a controversial issue, perhaps due in part to a 1975 publication in the Psychological Bulletin by Overall and Woodward, “Unreliability of Difference Scores: A Paradox for the Measurement of Change”, in which they demonstrated that a Student t test based on pretest-posttest differences can attain its greatest power when the difference score reliability is zero. In the present article, the authors attempt to explain this paradox by demonstrating in several ways that power is not a mathematical function of reliability unless either true score variance or error score variance is constant.  相似文献   

6.
Few adequately normed drawing tests are available for current practice. Two subtests of the McCarthy Scales, Draw-A-Design and Draw-A-Child, are the best normed of all drawing tests for children aged 2½ to 8½ years: however, no age-corrected deviation scaled scores are available for interpretaion, only raw scores and age equivalents. This paper presents scaled scores for use in interpretation of these two drawing tests.  相似文献   

7.
From the contexts of current social, educational and health policy, there appears to be an increasingly inevitable “mobilisation” of resources in medicine and health as the use mobile technology devices and applications becomes widespread and culturally “normed” in workplaces. Over the past 8 years, students from the University of Leeds Medical School have been loaned mobile devices and smartphones and been given access to mobile‐based resources to assist them with learning and assessments as part of clinical activity in placement settings. Our experiences lead us to suggest that educators should be focusing less on whether mobile learning should be implemented and more on developing mobile learning in curricula that is comprehensive, sustainable, meaningful and compulsory, in order to prepare students for accessing and using such resources in their working lives.  相似文献   

8.
Reliability coefficients of linear combinations of observed scores have anomalous properties which have led to persistent difficulties in the investigation of difference scores and gain scores in test theory. Interpretation of these test scores is further complicated by effects of correlated errors of measurement which are likely to appear in difference scores and gain scores in practice. In this paper the discrepancies between classical results and correct results obtained from more general formulas, which allow for correlated errors, are examined systematically. These discrepancies depend strongly on the reliability coefficients of the respective tests and are smallest when the influence of the variables related by the formulas is least. A vector representation of difference scores reveals that these anomalies arise from simple geometric relations among observed scores, true scores, and error scores inherent in the test-theory model. In this context, doubts as to the usefulness of difference scores and gain scores in testing practice expressed by previous authors appear to be justified.  相似文献   

9.
Gender differences in mathematical performance have received considerable scrutiny in the fields of sociology, economics and psychology. We analyse a large data-set of high school graduates who took a standardised mathematical test in Russia in 2011 (n = 738,456) and find no substantial difference in mean test scores across boys and girls. However, boys have a greater variance of scores and are more numerous at the top of the distribution. We apply quantile regression to model the association between school characteristics and gender differences in test scores throughout the distribution of test scores. Male advantage in test scores, particularly at the top of the distribution, is concentrated in cities and in the schools with an advanced curriculum. In other high schools, especially in the countryside, gender differences in all parts of the distribution are small. We suggest several mechanisms based on selection and school effects that account for our findings.  相似文献   

10.
Lawrence’s Self‐Esteem Questionnaire (LAWSEQ) was administered to 120 Year 1 pupils in six schools in Belfast, Northern Ireland. A principal components analysis indicated that the scale items were unidimensional and that the reliability of the scores, as estimated by Cronbach’s alpha, was satisfactory (α = .73). There were no differences between boys and girls on either total scores or the individual items comprising the LAWSEQ. A follow‐up study, involving 71 of the children in Year 3, confirmed these findings but the stability of the scores between the two occasions (as indicated by Pearson’s r) was extremely low.  相似文献   

11.
Background

Since the early 1980s, there has been a growing interest in the potentiality of computers as facilitators of students' learning. The importance of using technology effectively as a learning tool has been emphasized by many researchers. However, finding good software that encourages pupils to explore and express mathematical ideas is becoming a crucial issue.

Purpose

This paper investigates the effect of spreadsheet and dynamic geometry software on the mathematics achievement and mathematics self-efficacy of 7th-grade students. The study further examines the gender differences with respect to computer self-efficacy, mathematics self-efficacy and mathematics achievement. The relationship among these three constructs is also investigated.

Sample

The study consisted of 64 7th-grade students from three different classes including all the 7th- graders in a school, which is located in an upper-middle-class area in Ankara, Turkey. Study participants were aged from 12 to 13. In total, the number of female and male students was equal. In this study, purposive sampling was used since the school where the study took place was well equipped in terms of computer laboratories and technological devices.

Design and methods

The evaluation used an experimental design where two software programs, Excel and Autograph, were used in experimental groups separately, and a control group took traditional-based instruction without using any technological tools such as a computer or calculator. The study was carried out during the spring semester of the 2001/02 academic year, where three instructional methods of study: Autograph-based instruction, spreadsheet-based instruction and traditionally based instruction, were randomly assigned to the three classes. The Mathematics achievement test was used to assess the students' performance on mathematics. In order to determine the self-efficacy expectation of the students with respect to mathematics and computers, a Mathematics self-efficacy scale and Computer self-efficacy scale were developed respectively. Analysis of covariance, bivariate correlations and t-test were used to analyse outcome data.

Results

Results revealed that the Autograph group and Traditional group had significantly greater mean scores than the Excel group with respect to mathematics achievement. The Autograph group had significantly greater mean scores than the Traditional group, while no significant mean difference was found between the Autograph and Excel groups and between the Excel and Traditional groups with respect to mathematics self-efficacy. No significant mean difference was found between boys and girls with respect to mathematics achievement and mathematics self-efficacy. On the other hand, boys had significantly greater mean scores than girls with respect to computer self-efficacy. In addition, significant correlations were found among efficacy scores and achievement.

Conclusions

The evidence suggests that students showed great enthusiasm for Autograph. Students in the Autograph group had the highest scores compared to other groups regarding mathematics achievement and mathematics self-efficacy. In addition, boys reported significantly higher scores with respect to computer self-efficacy where, during the Autograph-based instruction and spreadsheet-based instruction, boys were more willing to solve activities using computers compared to girls. On the other hand, treatments seemed not to have any effect on gender regarding mathematics self-efficacy and mathematics achievement.  相似文献   

12.
The definition of what it means to take a test online continues to evolve with the inclusion of a broader range of item types and a wide array of devices used by students to access test content. To assure the validity and reliability of test scores for all students, device comparability research should be conducted to evaluate the impact of testing device on student test performance. The current study looked at the comparability of test scores across tablets and computers for high school students in three commonly assessed content areas and for a variety of different item types. Results indicate no statistically significant differences across device type for any content area or item type. Student survey results suggest that students may have a preference for taking tests on devices with which they have more experience, but that even limited exposure to tablets in this study increased positive responses for testing on tablets.  相似文献   

13.
This study compared the motor activity technique of learning, using physical education activities, with traditional ways of developing science concepts with fifth grade slow learning children. Two groups of ten children each were equated on the basis of pretest scores. Both groups were taught by the same classroom teacher. One group was taught through motor activity learning and the other by traditional procedures. Both groups were retested after a two-week teaching period, and again after a three-month extended interval. The difference in the posttest scores favored the motor activity learning group, p < .01 (t = 4.33, df 9). The difference in the extended interval test also favored the same group, p < .001 (t = 6.37, df9). Using the differences in test scores as criteria for learning, the children in the motor activity learning group learned and retained significantly more than those in the traditional group.  相似文献   

14.
Many prominent intelligence tests (e.g., Wechsler Intelligence Scale for Children, Fifth Edition [WISC-V] and Reynolds Intellectual Abilities Scale, Second Edition [RIAS-2]) offer methods for computing subtest- and composite-level difference scores. This study uses data provided in the technical manual of the WISC-V and RIAS-2 to calculate reliability coefficients for difference scores. Subtest-level difference score reliabilities range from 0.59 to 0.99 for the RIAS-2 and from 0.53 to 0.87 for the WISC-V. Composite-level difference score reliabilities generally range from 0.23 to 0.95 for the RIAS-2 and from 0.36 to 0.87 for the WISC-V. Emphasis is placed on comparisons recommended by test publishers and a discussion of minimum requirements for interpretation of differences scores is provided.  相似文献   

15.
Historically, Angoff‐based methods were used to establish cut scores on the National Assessment of Educational Progress (NAEP). In 2005, the National Assessment Governing Board oversaw multiple studies aimed at evaluating the reliability and validity of Bookmark‐based methods via a comparison to Angoff‐based methods. As the Board considered adoption of Bookmark‐based methods, it considered several criteria, including reliability of the cut scores, validity of the cut scores as evidenced by comparability of results to those from Angoff, and procedural validity as evidenced by panelist understanding of the method tasks and instructions and confidence in the results. As a result of their review, a Bookmark‐based method was adopted for NAEP, and has been used since that time. This article goes beyond the Governing Board's initial evaluations to conduct a systematic review of 27 studies in NAEP research conducted over 15 years. This research is used to evaluate Bookmark‐based methods on key criteria originally considered by the Governing Board. Findings suggest that Bookmark‐based methods have comparable reliability, resulting cut scores, and panelist evaluations to Angoff. Given that Bookmark‐based methods are shorter in duration and less costly, Bookmark‐based methods may be preferable to Angoff for NAEP standard setting.  相似文献   

16.
Educable mentally retarded children and normal children of average intelligence were compared in performance on the Children’s Manifest Anxiety Scale. The purpose of the study was to determine the following: 1.) a measure of long-term test-retest reliability, 2.) suitability of the scale with a younger chronological age group, and 3.) comparative data on differences in anxiety scores between normal and retarded children.Controls for procedural modifications, residential and educational status, sex differences, chronological age range, and range of IQ were employed. Test-retest correlations indicated that the scale was reliable for normal Ss but not for the retarded Ss. CMAS effects based on age and IQ did exist. Older retardates received higher anxiety scores than younger retardates on Test 2, while Test 1 difference was not significant. Retarded children obtained higher anxiety scores than normal children on Test 1. It was concluded that reliability over a 10-month period is poor for retarded Ss. Moreover, the instrument is of doubtful utility with younger retarded Ss.  相似文献   

17.
The paper provides (1) a teacher-administered rating instrument for inattention without confounding the rating with hyperactivity and conduct disorder, and (2) evidence that the ratings correlate with the scores obtained from cognitive tests of attention. In Study I, the first objective was to investigate the construct validity and the inter-rater reliability of the Attention Checklist (ACL) by factor analysing the teacher ratings of 110 Grade 4 children, obtained by using the ACL. The second objective was to investigate the predictive validity of the ACL by examining the relationship between the scores obtained for the participants from teachers' ratings using the ACL and the scores obtained by participants in the lab-type attention tests. The results of factor analysis showed that a single factor labelled ‘inattention’ underlies the 12 items in the ACL. Examining the differences in performance on attention tests, the ‘low attention’ children as rated by the teachers on the ACL scored lower than the ‘high attention’ children on the objective tests of attention. These findings were replicated in Study II, which was conducted to test further the construct validity and predictive validity of the ACL. This time, only those two tests (Auditory Attention and Visual Attention) that had shown relatively poor discrimination between the high and low attention groups in Study I were, again, administered to another cohort of 97 Grade 4 children, as it was our intention to further challenge the reliability of the ACL. Overall, the results of both studies suggest that comprehensive assessment of attention skills should include both ACL and objective measures of selective attention.  相似文献   

18.
One hundred sixty American and 397 Korean fourth‐ and fifth‐graders were administered the Student Social Attribution Scale (SSAS), designed to assess students' explanations for social successes and failures. A Korean version of the SSAS was developed for the study. The American and Korean instruments' internal consistency reliability were determined (rs ranged from .56 to .86 for the Korean instrument and .62 to .88 for the American instrument). The means from both the American and Korean SSAS versions on the 8 scales and global scores (e.g., internal, external) were compared. Based on the literature, Korean children should have had higher scores for effort attributions in failure situations than the American children and Americans should have shown higher scores for ability attributions in successful situations. In fact, Korean children did show significantly higher ( p < .005) Failure Effort scores and American children showed significantly higher ( p < .005) Success Ability scores. Findings indicate that Korean children are potentially more willing to accept responsibility for social failure than American students. © 2002 John Wiley & Sons, Inc.  相似文献   

19.
Achievement and cognitive tests are used extensively in the diagnosis and educational placement of children with reading disabilities (RD). Moreover, research on scholastic interventions often requires repeat testing and information on practice effects. Little is known, however, about the test-retest and other psychometric properties of many commonly used measures within the beginning reader population, nor are these nationally normed or experimental measures comparatively evaluated. This study examined the test-retest reliability, practice effects, and relations among a number of nationally normed measures of word identification and spelling and experimental measures of achievement and reading-related cognitive processing tests in young children with significant RD. Reliability was adequate for most tests, although lower than might be ideal on a few measures when there was a lengthy test-retest interval or with the reduced behavioral variability that can be seen in groups of beginning readers. Practice effects were minimal. There were strong relations between nationally normed measures of decoding and spelling and their experimental counterparts and with most measures of reading-related cognitive processes. The implications for the use of such tests in treatment studies that focus on beginning readers are discussed.  相似文献   

20.
The Pervasive Developmental Disorders Rating Scale (PDDRS; Eaves, 1993) is a screening instrument used in the assessment of autistic disorder. In this study, the reliability of test scores for the PDDRS was examined with three samples. The first sample consisted of 456 participants ranging in age from 1 to 12 years old and the second sample consisted of 111 participants in the 13 to 24 year‐old range. Additionally, the test‐retest reliability of scores for the PDDRS was examined with a sample of 40 participants. The results indicated that coefficient alpha for the PDDRS Total Score was adequate for screening purposes (r = .89) for both age groups. The results of the test‐retest study also suggested that PDDRS had adequate test‐retest reliability (r = .92) for the PDDRS Total Score. © 2002 Wiley Periodicals, Inc. Psychol Schs 39: 605–611, 2002.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号