首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Abstract

The instructional procedure of separating the similar sounds e and i from each other during instruction was evaluated in two experiments. In Experiment 1,42 first graders were assigned to a group in which e and i were introduced together or were separated by four other letters. Children in the similar-separated group made more correct responses to the two target letters during training, but the posttest scores were low and did not differ for the two groups. By measuring trials to criterion, Experiment II investigated the efficiency of separating similar sounds and of cumulatively introducing each sound. In the cumulative introduction procedure, children were brought to criterion on each group of sounds before a new sound was introduced. Thirty-five pre-schoolers completed training in one of three groups: similar-separated with cumulative introduction, similar-together with cumulative introduction, and similar-separated with simultaneous introduction. Children from the similar-separated group with cumulatively introduced sounds reached criterion significantly more quickly than the similar-together group and the simultaneous group. Posttest scores for all the children were substantially higher than in Experiment I and were significantly higher for the two cumulative introduction groups than for the simultaneous group.  相似文献   

2.
3.
Martin   《Assessing Writing》2009,14(2):88-115
The demand for valid and reliable methods of assessing second and foreign language writing has grown in significance in recent years. One such method is the timed writing test which has a central place in many testing contexts internationally. The reliability of this test method is heavily influenced by the scoring procedures, including the rating scale to be used and the success with which raters can apply the scale. Reliability is crucial because important decisions and inferences about test takers are often made on the basis of test scores. Determining the reliability of the scoring procedure frequently involves examining the consistency with which raters assign scores. This article presents an analysis of the rating of two sets of timed tests written by intermediate level learners of German as a foreign language (n = 47) by two independent raters who used a newly developed detailed scoring rubric containing several categories. The article discusses how the rubric was developed to reflect a particular construct of writing proficiency. Implications for the reliability of the scoring procedure are explored, and considerations for more extensive cross-language research are discussed.  相似文献   

4.
ABSTRACT

The authors address the reliability of scores obtained on the summative performance assessments during the pilot year of our research. Contrary to classical test theory, we discussed the advantages of using generalizability theory for estimating reliability of scores for summative performance assessments. Generalizability theory was used as the framework because of the flexibility this approach provides for examining sources of inconsistency within a complex assessment. Two major sources of inconsistency on scores considered in this study were raters and agencies (teachers' rating vs. researchers' rating). Overall, results showed that the inconsistency in scores attributable to raters and agencies was relatively small. Suggestions regarding improvement of consistency in the subsequent years of our research were provided.  相似文献   

5.
The purpose of this study was to examine the validity and reliability of Curriculum-Based Measures in writing for English learners. Participants were 36 high school English learners with moderate to high levels of English language proficiency. Predictor variables were type of writing prompt (picture, narrative, and expository), time (3, 5, and 7 min), and scoring procedure (words written, words spelled correctly, correct word sequences, correct minus incorrect word sequences). Criterion variables were teacher ratings of writing performance and student performance on the Test of Written Language-III, the writing subtest of the Test of Emerging Academic English, and the Minnesota state writing test. Results supported the validity and reliability of a 5 to 7-min writing sample written in response to a narrative or picture prompt and scored for percent of correct word sequences, correct minus incorrect word sequences, or words written plus correct minus incorrect word sequences.  相似文献   

6.
Heterogeneity of Peer-rejected Girls   总被引:1,自引:0,他引:1  
Heterogeneity within a sample of 46 peer-rejected 8–10-year-old girls was investigated using cluster analysis procedures. Rejected girls were identified using rating sociometrics, and peer and teacher behavior rating measures were obtained. 2 large clusters emerged from the analysis, with one of these being more deviant than the other. The more deviant group was characterized by withdrawal, anxiety, and low academic functioning. In contrast to findings previously reported for boys, aggression scores did not differentiate the 2 clusters. Thus, it does not appear that the use of a combination of aggression and rejection criteria identifies the most deviant group of girls.  相似文献   

7.
Abstract

Zero, 4, and 8 second delays in knowledge of results were used in a computer-assisted instruction task learned by sixty men and sixty women. The task was a tutorial constructed-response program dealing with binary numbers, presented via an electric typewriter. Criteria used were (a) time taken to complete the program (corrected for delay times), (b) number of correct responses during learning, (c) number of correct responses made on an achievement test on the program, and (d) scores on a test of expressed attitude toward computer-assisted instruction (CAI). Women completed the task more quickly, and showed poorer attitudes toward CAI when the delay interval was 8 seconds. Other performance criteria were unaffected.  相似文献   

8.
9.
Responses to a 40-item test were simulated for 150 examinees under free-response and multiple-choice formats. The simulation was replicated three times for each of 30 variations reflecting format and the extent to which examinees were (a) misinformed, (b) successful in guessing free-response answers, and (c) able to recognize with assurance correct multiple-choice options that they could not produce under free-response testing. Internal consistency reliability (KR20) estimates were consistently higher for the free-response score sets, even when the free-response item difficulty indices were augmented to yield mean scores comparable to those from multiple-choice testing. In addition, all test score sets were correlated with four randomly generated sets of unit-normal measures, whose intercorrelations ranged from moderate to strong. These measures served as criteria because one of them had been used as the basic ability measure in the simulation of the test score sets. Again, the free-response score sets yielded superior results even when tests of equal difficulty were compared. The guessing and recognition factors had little or no effect on reliability estimates or correlations with the criteria. The extent of misinformation affected only multiple-choice score KR20's (more misinformation—higher KR20's). Although free-response tests were found to be generally superior, the extent of their advantage over multiple-choice was judged sufficiently small that other considerations might justifiably dictate format choice.  相似文献   

10.
Two conventional scores and a weighted score on a group test of general intelligence were compared for reliability and predictive validity. One conventional score consisted of the number of correct answers an examinee gave in responding to 69 multiple-choice questions; the other was the formula score obtained by subtracting from the number of correct answers a fraction of the number of wrong answers. A weighted score was obtained by assigning weights to all the response alternatives of all the questions and adding the weights associated with the responses, both correct and incorrect, made by the examinee. The weights were derived from degree-of-correctness judgments of the set of response alternatives to each question. Reliability was estimated using a split-half procedure; predictive validity was estimated from the correlation between test scores and mean school achievement. Both conventional scores were found to be significantly less reliable but significantly more valid than the weighted scores. (The formula scores were neither significantly less reliable nor significantly more valid than number-correct scores.)  相似文献   

11.
Objectives: The purpose of the study was to compare the relative contributions of Rowe and Kahn’s definition of successful aging (SA), resilience, and the holistic wellness paradigm for predicting happiness, life satisfaction, and self-rated physical health in late life.

Method: A cross-sectional research design was used to survey 200 residents across 12 senior housing sites. Criteria with strong psychometric properties representing the three constructs were operationalized using hierarchical regression within the context of relevant control variables to compare the relative strengths of the three paradigms for predicting measures of quality of life.

Results: In this study, 8.5% of the sample met modified criteria for SA and were used as a comparison group with those who did not meet the criteria. Overall, holistic wellness and resilience predicted happiness, life satisfaction, and physical health better than SA alone. When predicting happiness and life satisfaction, race and holistic wellness were significant predictors. Age and holistic wellness were the best predictors of self-rated physical health.

Conclusion: The criteria underlying SA poorly predicted happiness, life satisfaction, and self-rated physical health compared to the resilience and holistic wellness models. The results suggest that definitions of aging well are complex and require greater nuance. The findings have important implications for clinicians seeking translatable theoretical models that are amenable to practice with older adults, especially for those living in independent senior housing communities.  相似文献   


12.
This study examined the effects of a behavioural correspondence training procedure on the rate of writing of four 13‐year‐old boys in a class for low achieving students in a city high school. A comprehensive range of collateral measures of writing was employed in addition to the target measure of writing rate. These collateral measures were assessed through analytic and holistic scoring procedures. Transfer of control of the correspondence training procedure to the class teacher and maintenance of writing gains were also examined. Results show that correspondence training effectively improved and maintained the rate and quality of written expression of all four boys.  相似文献   

13.
Standard errors of measurement of scale scores by score level (conditional standard errors of measurement) can be valuable to users of test results. In addition, the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1985) recommends that conditional standard errors be reported by test developers. Although a variety of procedures are available for estimating conditional standard errors of measurement for raw scores, few procedures exist for estimating conditional standard errors of measurement for scale scores from a single test administration. In this article, a procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores. This method is illustrated using a strong true score model. Practical applications of this methodology are given. These applications include a procedure for constructing score scales that equalize standard errors of measurement along the score scale. Also included are examples of the effects of various nonlinear raw-to-scale score transformations on scale score reliability and conditional standard errors of measurement. These illustrations examine the effects on scale score reliability and conditional standard errors of measurement of (a) the different types of raw-to-scale score transformations (e.g., normalizing scores), (b) the number of scale score points used, and (c) the transformation used to equate alternate forms of a test. All the illustrations use data from the ACT Assessment testing program.  相似文献   

14.
This study describes the initial validation of an innovative social‐‐behavioral observational assessment tool that is designed to be used on a repeated basis to assess growth and development of social competence over time to: (a) identify the social functioning of all students, (b) assist in planning support for students at risk, and (c) evaluate the effectiveness of individual and system‐wide interventions. Eighteen first‐grade students were monitored over an 8‐week period using the Initiation‐Response Assessment (IRA) Code. The School Social Behavior Scales, a published teacher rating scale, was included as a criterion measure. Estimates of reliability and criterion‐related validity were calculated for the IRA. The measure's sensitivity to growth over time and between‐group variability were also assessed using hierarchical linear modeling procedures. Results indicate that scores on this measure are stable, and tap constructs similar to those assessed via teacher rating. © 2008 Wiley Periodicals, Inc.  相似文献   

15.
16.
The present article reviews the procedures that have been developed for measuring the reliability of human observers' judgments when making direct observations of behavior. Measures such as the percentage of agreement, Cohen's kappa, and phi have been used to measure observer agreement; however, these coefficients have serious limitations. In addition to specifying the deficiencies that exist with these excessively used reliability measures, the present article discusses recently developed univariate and multivariate agreement measures that are based on quasi-equiprobability and quasi-independence models and estimates. Improvements in precision are provided by such models and estimates since they (1) yield a probability based coefficient of agreement with a directly interpretable meaning, (2) correct for the proportion of “chance” agreement, and (3) allow the partitioning of the agreement and disagreement estimates within the models.  相似文献   

17.
Abstract

The study was designed to assess the strengths and weaknesses of the nursing education preparation of associate degree nursing graduates as reflected in their job performance. The predictive relationships of measures of scholastic success such as G.P.A. and State Board Examination Scores with graduate job performance were also investigated. A rating scale of 62 items was designed to measure the following dimensions of nursing performance: (a) planning for nursing care, (b) implementing nursing care, (c) interpersonal relationships and communication, (d) leadership and group procedures, (e) evaluating and reporting nursing care, (f) professional involvement, and (g) other. Sources for the rating scale included curriculum objectives and a field survey of performance criteria. Graduates were rated by a nurse and a physician who function in close supervision of their job. Graduates completed a similar rating scale in which they were asked to rate the adequacy of their educational preparation for various job requirements. Ratings were obtained from a sample of 153 graduates of the associate degree nursing program at Delta College, University Center, Michigan. Results indicated a stated need for additional clinical experience requiring total involvement of nursing students, advanced course work in pharmacology, anatomy, physiology, and nutrition, and planned leadership preparation. Findings demonstrated no significant relationship between various indices of G.P.A. and State Board Examination Scores with job rated performance. It is projected that rated job performance is influenced by a number of personality variables. Physicians perceive the performance of nurses from different perspectives than do supervising nurses.  相似文献   

18.
Parafoveal word processing was examined during Korean reading. Twenty-four native speakers of Korean read sentences in two conditions while their eye movements were being monitored. The boundary paradigm (Rayner, 1975) was used to create a mismatch between characters displayed before and after an eye movement contingent display change. In the first condition, the critical previews were correct case markers in terms of syntactic category (e.g., object marker for an object noun) but with a phonologically incorrect form (e.g., using 를 instead of 을 when the preceding noun ends with a consonant). In the second condition, incorrect case markers in terms of syntactic category were used, creating a semantic mismatch between preview and target. Results include a small but significant parafovea-on-fovea effect on the preceding fixation, combined with a large effect on late measures of target word reading when a syntactically incorrect preview was presented. These results indicate that skilled Korean readers are quite sensitive to high-level linguistic information available in the parafovea.  相似文献   

19.
20.
This narrative synthesis reviews the psychometric properties of commercially and publicly available retell instruments used to assess the reading comprehension of students in grades K–12. Eleven instruments met selection criteria and were systematically coded for data related to the administration procedures, scoring procedures, and technical adequacy of the retell component. High variability was evident in the prompting conditions and the use of quantitative and qualitative scoring mechanisms. Because no two instruments shared the same features, their retell scores are likely not equitable. None of the measures provided sufficient information to substantiate their reliability and validity. Many were lacking data on critical psychometric aspects, such as passage equivalency and construct validity, and nearly all had insufficient or ill-defined norming samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号