期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluating Change and Stability in Learning Style Scores: a methodological concern

Robert Loo 《教育心理学》1997,17(1-2):95-100

Learning styles are purported to be relatively stable characteristics with some change or development expected. Some studies using Kolb's Learning Style Inventory (LSI) have reported significant positive test‐retest correlations of LSI scores or nonsignificant repeated‐measures ANOVAs and concluded that learning styles are stable. This study examined stability and change on Kolb's revised Learning Style Inventory (LSI‐1985) using 152 participants at two points in time separated by about 10 weeks. A variety of statistics were used to evaluate stability and change in LSI‐1985 scores for the four subscales and two dimensions and the four learning styles. The use of test‐retest correlations, differences between means and other methods emphasising group effects were criticised. It was recommended that researchers also analyse and report the stability and change of style categories directly, not just score changes. These comments are also applicable to other learning style measures such as the Learning Style Questionnaire. 相似文献

2.

WISC-III score changes for EMH students

Larry M. Bolen 《Psychology in the schools》1998,35(4):327-332

WISC-III IQ score changes, for a group of 70 educable mentally handicapped students, over a 3-year period were found to be significantly lower for VIQ, Information, and Vocabulary tests. Trait stability (based on an average 3-year retest interval) in intelligence measurement, as represented by the WISC-III FSIQ, VIQ, and PIQ, was considerably lower—as expected—than reported for the standardization sample (based on a 23 day retest interval) in the WISC-III manual. Significant individual variation was observed, with the FSIQ showing a 22 point range difference and the VIQ and PIQ showing even larger ranges between the first and second testing. Implications regarding profile analysis and program planning for EMH students are presented. © 1998 John Wiley & Sons, Inc. 相似文献

3.

Longitudinal factor structure of the WISC‐III among students with disabilities

Marley W. Watkins Gary L. Canivez 《Psychology in the schools》2001,38(4):291-298

If the factor structure of a test does not hold over time (i.e., is not invariant), then longitudinal comparisons of standing on the test are not meaningful. In the case of the Wechsler Intelligence Scale for Children‐Third Edition (WISC‐III), it is crucial that it exhibit longitudinal factorial invariance because it is widely used in high‐stakes special education eligibility decisions. Accordingly, the present study analyzed the longitudinal factor structure of the WISC‐III for both configural and metric invariance with a group of 177 students with disabilities tested, on average, 2.8 years apart. Equivalent factor loadings, factor variances, and factor covariances across the retest interval provided evidence of configural and metric invariance. It was concluded that the WISC‐III was measuring the same constructs with equal fidelity across time which allows unequivocal interpretation of score differences as reflecting changes in underlying latent constructs rather than variations in the measurement operation itself. © 2001 John Wiley & Sons, Inc. 相似文献

4.

Psychometric Equivalence of Ratings for Repeat Examinees on a Performance Assessment for Physician Licensure

Mark R. Raymond Kimberly A. Swygert Nilufer Kahraman 《Journal of Educational Measurement》2012,49(4):339-361

Although a few studies report sizable score gains for examinees who repeat performance‐based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single‐take examinees and 4,030 repeat examinees who completed a 6‐hour clinical skills assessment required for physician licensure. Each examinee was rated in four skill domains: data gathering, communication‐interpersonal skills, spoken English proficiency, and documentation proficiency. Conditional standard errors of measurement computed for single‐take and multiple‐take examinees indicated that ratings were of comparable precision for the two groups within each of the four skill domains; however, conditional errors were larger for low‐scoring examinees regardless of retest status. In addition, on their first attempt multiple‐take examinees exhibited less score consistency across the skill domains but on their second attempt their scores became more consistent. Further, the median correlation between scores on the four clinical skill domains and three external measures was .15 for multiple‐take examinees on their first attempt but increased to .27 for their second attempt, a value, which was comparable to the median correlation of .26 for single‐take examinees. The findings support the validity of inferences based on scores from the second attempt. 相似文献

5.

Development and Initial Validation of the Narcissistic Personality Questionnaire for Children: A preliminary investigation using school‐based Asian samples

Rebecca P. Ang Noradlin Yusof 《教育心理学》2006,26(1):1-18

The Narcissistic Personality Questionnaire for Children (NPQC) is a brief self‐report scale for measuring narcissism in children. In Study 1, a factor analysis on 370 children’s NPQC scores revealed four factors that were labeled superiority, exploitativeness, self‐absorption, and leadership. Study 2 established convergent and discriminant validities of the NPQC. NPQC scores were positively correlated with need for power/dominance, self‐esteem, aggression, and need for achievement, and unrelated to life satisfaction, as expected. Further support for the validity of the NPQC was obtained when findings were consistent with attachment theory’s interpretation of narcissistic children’s self‐perceptions. Study 3 investigated the temporal stability of scores. Results from Studies 1 and 3 show the NPQC to be an internally consistent measure (Cronbach alpha = .81) and to have adequate test–retest reliability (r = .81). Implications for the education of aggressive and narcissistic children are discussed. 相似文献

6.

Reporting the Percentage of Students above a Cut Score: The Effect of Group Size

Lynne Hollingshead Ruth A. Childs 《Educational Measurement》2011,30(1):36-43

Large‐scale assessment results for schools, school boards/districts, and entire provinces or states are commonly reported as the percentage of students achieving a standard—‐that is, the percentage of students scoring above the cut score that defines the standard on the assessment scale. Recent research has shown that this method of reporting is sensitive to small changes in the cut score, especially when comparing results across years or between groups. This study builds on that work, investigating the effects of reporting group size on the stability of results. In Part 1 of this study, Grade 6 students’ results on Ontario's 2008 and 2009 Junior Assessments of Reading, Writing and Mathematics were compared, by school, for different sizes of schools. In Part 2, samples of students’ results on the 2009 assessment were randomly drawn and compared, for 10 group sizes, to estimate the variability in results due to sampling error. The results showed that the percentage of students above a cut score (PAC) was unstable for small schools and small randomly drawn groups. 相似文献

7.

Performance changes on the K-TEA: Brief form for learning-disabled students

J. Barry Hewett Larry M. Bolen 《Psychology in the schools》1996,33(2):97-102

The purpose of the present investigation was to measure performance changes on the K-TEA: Brief Form over a 3-year retest interval. Fifty-two learning-disabled students were retested with the K-TEA: Brief Form following 3 years of special education services. Results showed an average decrease of 3.33 points in Battery Composite over the 3-year retest interval. Nonsignificant mean retest differences for Battery Composite, Math, Reading, and Spelling were found. Retest trait stability coefficients ranged from .69 (Mathematics) to .88 (Spelling), suggesting that the K-TEA: Brief Form is a reliable instrument in screening for changes in special achievement over an extended period. An examination of individual difference scores revealed significant within variability for each achievement area. © 1996 John Wiley & Sons, Inc. 相似文献

8.

Stability reliability for elementary-age students on the Woodcock-Johnson psychoeducational battery-Revised (achievement section) and the Kaufman Test of Educational Achievement

Shannon Shull-Senn Michael Weatherly Sandra Kanouse Morgan Sharon Bradley-Johnson 《Psychology in the schools》1995,32(2):86-92

These two studies examined the stability reliability for the Woodcock-Johnson-Revised (WJ-R; Woodcock & Johnson, 1989) and the Kaufman Test of Educational Achievment (KTEA; Kaufman & Kaufman, 1985) with approximately a 2-week retest interval for elementary-age students. Results indicated that across grade levels, the Broad Reading Cluster for the WJ-R remained stable. Most correlations for the clusters for mathematics and written language as well as the subtests for reading, mathematics, and written language were less than .90. Correlations for all composites and subtests for the KTEA exceeded .90. These data illustrate the need for more specific information in test manuals on test-retest reliability in order to enable examiners to select the most reliable measures. 相似文献

9.

Evaluating the Comparability of Paper‐ and Computer‐Based Science Tests Across Sex and SES Subgroups

Jennifer Randall Stephen Sireci Xueming Li Leah Kaira 《Educational Measurement》2012,31(4):2-12

As access and reliance on technology continue to increase, so does the use of computerized testing for admissions, licensure/certification, and accountability exams. Nonetheless, full computer‐based test (CBT) implementation can be difficult due to limited resources. As a result, some testing programs offer both CBT and paper‐based test (PBT) administration formats. In such situations, evidence that scores obtained from different formats are comparable must be gathered. In this study, we illustrate how contemporary statistical methods can be used to provide evidence regarding the comparability of CBT and PBT scores at the total test score and item levels. Specifically, we looked at the invariance of test structure and item functioning across test administration mode across subgroups of students defined by SES and sex. Multiple replications of both confirmatory factor analysis and Rasch differential item functioning analyses were used to assess invariance at the factorial and item levels. Results revealed a unidimensional construct with moderate statistical support for strong factorial‐level invariance across SES subgroups, and moderate support of invariance across sex. Issues involved in applying these analyses to future evaluations of the comparability of scores from different versions of a test are discussed. 相似文献

10.

Precision of age norms in tests used to assess preschool children

Janet E. Spector 《Psychology in the schools》1999,36(6):459-471

This study investigated normative precision in 14 preschool tests representing four domains: cognitive, language, adaptive behavior, and early academic skills. The purpose was to explore the consequences of using tests with more‐ vs. less‐precise age norms to identify disabilities in preschool children. As expected, on tests with more precise norms, standard scores associated with the same raw score shifted gradually across age groups. On the other hand, tests with less precise norms showed more dramatic standard score shifts across age groups. Examination of the degree of shift found in each test indicated that many preschool tests have norm tables that are potentially problematic for diagnosing disabilities, particularly for children near norm group cut‐off ages. On high stakes tests, an optimal span is one to three months. This standard can be achieved by using interpolation and/or increasing the size of norming samples at the preschool level. © 1999 John Wiley & Sons, Inc. 相似文献

11.

The reliability of test scores for the pervasive developmental disorders rating scale

Thomas O. Williams Ronald C. Eaves 《Psychology in the schools》2002,39(6):605-611

The Pervasive Developmental Disorders Rating Scale (PDDRS; Eaves, 1993) is a screening instrument used in the assessment of autistic disorder. In this study, the reliability of test scores for the PDDRS was examined with three samples. The first sample consisted of 456 participants ranging in age from 1 to 12 years old and the second sample consisted of 111 participants in the 13 to 24 year‐old range. Additionally, the test‐retest reliability of scores for the PDDRS was examined with a sample of 40 participants. The results indicated that coefficient alpha for the PDDRS Total Score was adequate for screening purposes (r = .89) for both age groups. The results of the test‐retest study also suggested that PDDRS had adequate test‐retest reliability (r = .92) for the PDDRS Total Score. © 2002 Wiley Periodicals, Inc. Psychol Schs 39: 605–611, 2002. 相似文献

12.

The Dependence of Growth‐Model Results on Proficiency Cut Scores

Andrew D. Ho Daniel M. Lewis Jason L. MacGregor Farris 《Educational Measurement》2009,28(4):15-26

States participating in the Growth Model Pilot Program reference individual student growth against “proficiency” cut scores that conform with the original No Child Left Behind Act (NCLB). Although achievement results from conventional NCLB models are also cut‐score dependent, the functional relationships between cut‐score location and growth results are more complex and are not currently well described. We apply cut‐score scenarios to longitudinal data to demonstrate the dependence of state‐ and school‐level growth results on cut‐score choice. This dependence is examined along three dimensions: 1) rigor, as states set cut scores largely at their discretion, 2) across‐grade articulation, as the rigor of proficiency standards may vary across grades, and 3) the time horizon chosen for growth to proficiency. Results show that the selection of plausible alternative cut scores within a growth model can change the percentage of students “on track to proficiency” by more than 20 percentage points and reverse accountability decisions for more than 40% of schools. We contribute a framework for predicting these dependencies, and we argue that the cut‐score dependence of large‐scale growth statistics must be made transparent, particularly for comparisons of growth results across states. 相似文献

13.

Approaches to Studying in Higher Education: a comparative study in the South Pacific

John T. E. Richardson Roger Landbeck France Mugler 《教育心理学》1995,15(4):417-432

A short version of the Approaches to Studying Inventory (ASI), commended as a ‘quick and easy’ means of assessing student learning, was administered to two groups of students at the University of the South Pacific. Measures of its internal consistency and test‐retest reliability were comparable with those obtained in European research, but were not wholly satisfactory. Moreover, its factor structure was found to be qualitatively different in this context and constituted by different forms of motivation for studying in higher education. It is concluded that approaches to studying are culture‐specific and, in particular, that one should be cautious about using this version of the ASI in systems of higher education in non‐Western countries. 相似文献

14.

Stability of the AAMD adaptive behavior scale,public school version

Thelma Givens L. Charles Ward 《Psychology in the schools》1982,19(2):166-169

For a sample of 49 regular class children, test-retest stability was investigated for domains and subdomains of the AAMD Adaptive Behavior Scale, Public School Version. Satisfactory indices of reliability were obtained for all Part One domains except Vocational Activity (r=.43), and for all Part One subdomains except two, Care of Clothing and Dressing and Undressing. Part Two, however, evidenced good reliability coefficients for only three of the 11 domains studied. Moreover, there was a tendency for Part Two retest ratings to be lower (less deviant) than original ratings. These findings suggest that Part One scores are stable and therefore potentially useful, but Part Two domains may lack discriminative power and stability when used with regular class children. 相似文献

15.

STABILITY AND CHANGE OF BEHAVIORAL AND EMOTIONAL SCREENING SCORES

Bridget V. Dever Erin Dowdy Tara C. Raines Katherine Carnazzo 《Psychology in the schools》2015,52(6):618-629

Universal screening for behavioral and emotional difficulties is integral to the identification of students needing early intervention and prevention efforts. However, unanswered questions regarding the stability of screening scores impede the ability to determine optimal strategies for subsequent screening. This study examined the 2‐year stability of behavioral and emotional risk screening scores and investigated whether change could be predicted based on student characteristics or initial risk scores. As part of a district‐wide screening effort, 863 middle and high school students completed the Behavioral and Emotional Screening System at two time points. Stability coefficients were moderate, with the majority of students remaining in a similar risk category across time. Gender, race/ethnicity, socioeconomic status, grade, school transition, and special education status were not predictive of movement across time. Initial risk score was predictive of movement from normal to at‐risk categorization, with the internalizing domain being the most predictive of change. 相似文献

16.

Assessing Early Language Development in Children with Vision Disability and Motor Disability

Stephen Hennessey 《International Journal of Disability, Development & Education》2011,58(2):169-187

This article describes a method for identifying test items as disability neutral for children with vision and motor disabilities. Graduate students rated 130 items of the Preschool Language Scale and obtained inter‐rater correlation coefficients of 0.58 for ratings of items as disability neutral for children with vision disability, and 0.77 for ratings of items as disability neutral for children with motor disability. These ratings were used to create three item sets considered disability neutral for children with vision disability, motor disability, or both disabilities. Two methods for scoring the item sets were identified: scoring each set as a partially administered developmental test, or computing standard scores based upon pro‐rated raw score totals. The pro‐rated raw score method generated standard scores that were significantly inflated and therefore less useful for the assessment purposes than the ratio quotient method. This research provides a test accommodation technique for assessing children with multiple disabilities. 相似文献

17.

Validating Student Score Inferences With Person‐Fit Statistic and Verbal Reports: A Person‐Fit Study for Cognitive Diagnostic Assessment

Ying Cui Mary Roduta Roberts 《Educational Measurement》2013,32(1):34-42

The goal of this study was to investigate the usefulness of person‐fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two‐stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person‐fit statistic, the hierarchy consistency index (HCI; Cui, 2007 ; Cui & Leighton, 2009 ), was used to identify the misfitting student item‐score vectors. In the second stage, students’ verbal reports were collected to provide additional information about students’ response processes so as to reveal the actual causes of misfits. This two‐stage procedure helped to identify the misfits of item‐score vectors to the cognitive model used in the design and analysis of the diagnostic test, and to discover the reasons of misfits so that students’ problem‐solving strategies were better understood and their performances were interpreted in a more meaningful way. 相似文献

18.

Change in Self‐esteem Between Year 2 and Year 6: a longitudinal study

Julie Davies Ivy Brember 《教育心理学》1995,15(2):171-180

All Year 2 children in six randomly selected primary schools within one Local Education Authority (LEA) comprised the sample to which the Lawseq self‐esteem questionnaire was administered. Four years later, when they were Year 6, they completed the Lawseq again. A two‐way analysis of variance with Sex and Occasions was carried out on the 12 individual items of the instrument and the total. There were no significant differences between occasions or sexes on the overall score, but there were significant differences between occasions on seven of the 12 items and between sexes on two items. On only one item was there a significant interaction between sexes and occasions. The mean for the total fell over the 4 years. The means for both occasions were considerably below the mean of 19.00 obtained when Lawrence standardised the test in 1981. Discussion centred on possible reasons for this, such as appropriacy of the instrument for the age‐groups under study, stability of administration and changes within society and school. 相似文献

19.

The Mathematical Quality of Instruction (MQI) in Kindergarten: An Evaluation of the Stability of the MQI Using Generalizability Theory

Panayota Mantzicopoulos Brian F. French Helen Patrick 《Early education and development》2018,29(6):893-908

Research Findings: We evaluated the score stability of the Mathematical Quality of Instruction (MQI), an observational measure of mathematics instruction. Three raters each scored, independently, 100 video-recorded lessons taught by 20 kindergarten teachers in the spring. Using generalizability theory analyses, we decomposed the MQI’s score stability into potential sources of variation (teachers, lessons, raters, and their interactions). The 13-item (3-domain) Ambitious Mathematics Instruction scale and the Whole Lesson scale each explained about one third of the variance attributed to differences in the main construct of interest (teachers’ instructional strategies). The MQI’s Errors and Imprecision scale was not relevant at the kindergarten level; there were virtually no errors and/or ambiguities observed across the 100 mathematics lessons. In a series of decision studies, we examined improvements in reliability with combinations of up to 6 raters and 8 lessons. Only the Richness of Mathematics domain scores and the Whole Lesson scores achieved acceptable reliabilities. Practice or Policy: The findings have important implications for the use of observation measures to document teachers’ mathematics practices in the early years of school. 相似文献

20.

How Days Between Tests Impacts Alternate Forms Reliability in Computerized Adaptive Tests

Adam E. Wyse 《Educational and psychological measurement》2021,81(4):644

An essential question when computing test–retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computerized adaptive tests to explore how the number of days between tests impacts alternate forms reliability coefficients. Results suggest that the highest alternate forms reliability coefficients were obtained when the second test was administered at least 2 to 3 weeks after the first test. Even though reliability coefficients after this amount of time were often similar, results suggested a potential tradeoff in waiting longer to retest as student ability tended to grow with time. These findings indicate that if keeping student ability similar is a concern that the best time to retest is shortly after 3 weeks have passed since the first test. Additional analyses suggested that alternate forms reliability coefficients were lower when tests were shorter and that narrowing the first test ability distribution of examinees also impacted estimates. Results did not appear to be largely impacted by differences in first test average ability, student demographics, or whether the student took the test under standard or extended time. It is suggested that for math and reading tests, like the ones analyzed in this article, the optimal retest interval would be shortly after 3 weeks have passed since the first test. 相似文献