首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 230 毫秒
1.
郑燕祥对教育效能的分析指出了学校效能的多元性和复杂性,但对学校效能差异性的认识不全面。学校效能改进模型是对郑燕祥教育效能观的补充,指出学校效能的静态差异及学校效能改进的动态差异,对学校效能改进具有一定启示。  相似文献   

2.
The safety of America's schools is a major issue. Yet, the magnitude of the problem cannot be accurately assessed because some of the data concerning incidents and disciplinary actions come from reporting systems that are seriously flawed. In this article we examine how data from student self‐report surveys and other sources can be used to assess the weaknesses in current school incident‐reporting systems and improve the validity of surveillance data on school violence. Particular attention is paid to assessing the validity of data from Gun‐Free Schools Act (GFSA) reports on the number of guns in schools in light of nationally representative student survey data. We also discuss the difficulties of obtaining accurate surveillance data and suggest changes in surveillance systems that could produce more valid estimates of violence and injury in our nation's schools. © 2001 John Wiley & Sons, Inc.  相似文献   

3.
4.
5.
《Educational Assessment》2013,18(3):155-196
To evaluate student learning in a computer-supported environment known as GenScope(tm), we developed a system for assessing students' reasoning proficiency in introductory genetics. A critical aspect of the development effort concerned the validity of this assessment system. We used a variety of methods to address traditional evidential validity concerns as well as more contemporary concerns with consequential and systemic validity. Specifically, we examined whether or not our assessment system helped students develop the understanding it was designed to assess. Our inquiry revealed strong evidential validity but only limited consequential validity. In response, we developed a set of formative assessments designed to scaffold student assessment performance without compromising the evidential validity of the assessment system. In addition to documenting and enhancing the validity of the system, these efforts demonstrate the utility of newer interpretive models of validity inquiry and the value of Rasch measurement tools for conducting such inquiry.  相似文献   

6.
Over the past 40 years, numerous instruments have been developed to assess the learning environment for a variety of purposes. Despite this plethora of available surveys, there are few that have been developed for use at the primary school level, and even fewer that have been comprehensively validated. This article describes the development of a long-overdue learning environment survey that is suited to primary school students. Evidence to support the validity of the survey, in terms of translation and criterion validity, was guided by Trochim and Donnelly’s (2006) construct validity framework. A pilot test involving one class of 30 students and interviews with six students was used to examine the face validity of individual items. Analyses of data collected from 609 students in 31 classes supported the convergent, concurrent, discriminant and predictive validity, the results of which were all satisfactory. This article is significant in that it provides educators and researchers with a valid tool to assess the learning environment. The instrument, named the Classroom Climate Questionnaire—Primary (CCQ-P), is described and its practical advantages and limitations are discussed.  相似文献   

7.
8.
The Chinese Early Childhood Environment Rating Scale (trial) (CECERS) is a new instrument for measuring early childhood program quality in the Chinese socio-cultural contexts, based on substantial adaptation from the Early Childhood Environment Rating Scale-Revised Edition (ECERS-R). This paper describes the development and validation process of CECERS. Empirical data were collected from a stratified random sample 178 classrooms, from which a random sample of 1012 children was measured for child development outcomes. Guided by the framework of broad conceptualization of validity and validation as advocated by Messick (1989), evidence in a variety of forms is presented and discussed, including content validity considerations (e.g., measuring socially and culturally relevant domains), measurement reliability considerations (e.g., internal consistency reliability, inter-rater reliability), and measurement validity considerations (concurrent validity, criterion-related validity, internal structure based on exploratory factor analysis). The empirical findings for CECERS compare very favorably with the validation outcomes of ECERS-R. The body of evidence accumulated in the validation process supports the use and interpretation of CECERS scores as quality indicators of early childhood education program in the Chinese social and cultural contexts. Limitations and future directions are also discussed.  相似文献   

9.
In this paper, we describe the Scientific Habits of Mind Survey (SHOMS) developed to explore public, science teachers’, and scientists’ understanding of habits of mind (HoM). The instrument contained 59 items, and captures the seven SHOM identified by Gauld. The SHOM was validated by administration to two cohorts of pre-service science teachers: primary science teachers with little science background or interest (n?=?145), and secondary school science teachers (who also were science graduates) with stronger science knowledge (n?=?145). Face validity was confirmed by the use of a panel of experts and a pilot study employing participants similar in demographics to the intended sample. To confirm convergent and discriminant validity, confirmatory factor analysis and evaluation of the reliability were calculated. Statistical data and other data gathered from interviews suggest that the SHOMS will prove to be a useful tool for educators and researchers who wish to investigate HoM for a variety of participants.  相似文献   

10.
The richness and complexity of video portfolios endanger both the reliability and validity of the assessment of teacher competencies. In a post-graduate teacher education program, the assessment of video portfolios was evaluated for its reliability, construct validity, and consequential validity. Although video portfolio facilitated a reliable and valid assessment of teacher competencies, procedures to improve assessment quality were also revealed and are therefore discussed: more explicit grounding of assessment results in the data, peer debriefing, prolonged engagement with the assessment data, cross-checking to find confirmatory or counter examples.  相似文献   

11.
The use of student achievement data to evaluate an individual teacher's effectiveness has become a new focus in educational policy. This article focuses on the underresearched teacher perception of this new policy measure. Drawing on ethnographic research procedures, this article explores how first-grade teachers in one state navigated a new high-stakes teacher evaluation system. Although the results indicate that teachers have a desire for accountability, findings also show a variety of beliefs on the validity of teacher evaluation, as well as differing applications of scoring measures across school contexts.  相似文献   

12.
Duff, Mengoni, Bailey and Snowling (Journal of Research in Reading, 38: 109–123; 2015) evaluated the sensitivity and specificity of the phonics screening check against two reference standards. This report aims to correct a minor data error in the original article and to present further analysis of the data. The methods used are calculation of predictive values of the phonics screening check in addition to sensitivity and specificity, and evaluation of agreement between the reference tests. Predictive values are important indicators of screening test quality. The positive predictive value of the phonics check is low (0.31) when compared with a standardised reading test but high (0.84) when compared with teachers' phonic phases judgements, reflecting poor agreement (kappa = 0.27) between reference tests. Results have implications for practice in terms of choice of reference standard and choice of threshold criterion for children to pass the screening check. Longitudinal data are needed to assess the predictive validity and utility of the check. What is already known about this topic:
  • The importance of phonics in learning to read is widely acknowledged.
  • The phonics screening check was introduced into U.K. schools in 2012 to ensure that all children develop phonic decoding skills.
  • Estimates of the sensitivity and specificity of the phonics screening check, compared with two established ‘reference’ measures, were reported by Duff et al. ( 2015 ).
What this paper adds:
  • We correct a minor error in the report of the original data by Duff et al. ( 2015 ).
  • We draw attention to the importance of including predictive values, alongside sensitivity and specificity, in the evaluation of screening test validity. We also propose an alternative statistic for comparing the two reference measures.
  • We show that applying this further analysis to the data in Duff et al. ( 2015 ) reveals the following: (i) the numbers of incorrect (false positive and false negative) outcomes in the phonics check and (ii) the marked difference in these numbers depending on the choice of reference measure.
Implications for theory, policy or practice:
  • Reports of screening test validity should include positive and negative predictive values.
  • A fundamental consideration for evaluating the validity of the phonics screening check is the choice of reference measure.
  • Longitudinal data are needed to assess the predictive validity and utility of the phonics check.
  相似文献   

13.
In writing assessment, the inconsistency of teachers’ scorings is among the frequently reported concerns regarding the validity and the reliability of assessment. The study aimed to find out to what extent participating in a community of assessment practice (CAP) can impact the discrepancies among raters’ scorings. Adopting one group pretest-posttest design, patterns in the teachers’ scoring judgments were explored based on both quantitative and qualitative data. The results indicate significant increase in the degrees of agreement in the teachers’ differential scorings showing changes in their severity tendencies for structural variety, lexical accuracy, organization and mechanics criteria while their scoring judgements on structural accuracy, task achievement, and lexical variety criteria had low levels of agreement.  相似文献   

14.
The purpose of this study was to examine the contribution of leisure activities to optimism and personal growth among older adults. We used data from the Alameda County Health and Ways of Living Study. The sample consisted of 1600 individuals who were 60 years of age and older. While the literature shows that participating in leisure activities is relevant to improving the well-being of older adults, the impact of such participation across various age groups is yet to be determined. We employed a one-way multivariate analysis of variance to determine the age group differences with regard to optimism and personal growth. We also used a series of hierarchical regression models to examine the contribution of the types of leisure activities on optimism and personal growth across various age groups. The ability of leisure activity variables to predict optimism was the highest for the old-old group. The old-old group demonstrated the highest level of predictability from the leisure activity variables regarding personal growth. We suggest that professionals need to provide carefully selected leisure activities to enhance optimism and personal growth for clients within different age groups. Professionals may include a variety of physical, social, and volunteering activities for the young-old and old-old groups while more casual leisure activities such as community activities and entertainment can be offered to the adults of 80 years and older.  相似文献   

15.
Abstract

The current study explored preservice and inservice teachers’ perspectives on data literacy for teaching. Semi-structured interviews were employed with 12 teacher candidates in elementary and special education. The findings revealed participants’ misconceptions regarding formative and summative data; their understanding of the value of formative data; perceptions of challenges related to data literacy for teaching including time, making sense of data, and reliability and validity; and candidates’ preferences for authentic data literacy instruction.  相似文献   

16.
Indirect tests of writing competency are often used at the college level for a variety of educational, programmatic, and research purposes. Although such tests may have been validated on hearing populations, it cannot be assumed that they validly assess the writing competency of deaf and hard-of-hearing students. This study used a direct criterion measure of writing competency to determine the criterion validity of two indirect measures of writing competency. Results suggest that the validity of indirect writing tests for deaf and hard-of-hearing baccalaureate-level students is weak. We recommend that direct writing tests be used with this population to ensure fair and accurage assessment of writing competency.  相似文献   

17.
The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct effects of the instrument on the dependent variable, the results do not deviate much from the benchmark case of no such effect (perfect validity of the instrument's exclusion restriction). In many cases, the size of the bias is smaller than the width of the 95% posterior interval for the effect of education on income. Thus, a violation of the strict validity assumption does not necessarily lead to results which are strongly different from those of the strict validity case. This finding provides confidence in the use of family background variables as instruments in income regressions.  相似文献   

18.
The task of validating a teacher assessment and improvement system is similar whether the system operates in the United States or in another country. Chile has a national teacher evaluation system (NTES) that is standards based, uses multiple instruments, and is intended to serve both formative and summative purposes. For the past 6 years the authors have performed validation research on NTES using a variety of methods and data sources. This article describes our validation research agenda, the results of major validation studies, and an integration of the existing evidence, and it offers the authors' preliminary judgment about NTES's validity. The article also offers a critical reflection regarding the decisions taken while driving the long and winding validation road, and the lessons we learned during this politically and methodologically complex journey.  相似文献   

19.
The growing trend among universities to promote systems of programme and course evaluation entails more responsibility for faculties and departments. These systems require resources to ensure that they are not only valid and reliable but also effective and sustainable. The design of rubric-based assessment systems may provide a solution, but there is a gap in the research on curriculum evaluation concerning their use and validation. We examine the content aspect of validity in a rubric-based assessment system for course syllabuses using a mixed method that combines an analysis of the agreement among 23 experts with a phenomenographic study. With data gathered through a questionnaire linked to the Delphi technique, content validity indexes were calculated and the experts' different perspectives were identified. The content validity indexes (greater than .80) met the standards set out in literature, and the qualitative study of the experts' feedback showed three different perspectives on the system's use. Beyond providing evidence of the system's content validity, the study highlights the extent to which it is important to give appropriate consideration to experts' – and by extension final users' – experience in order to ensure the successful implementation of rubric-based assessment systems.  相似文献   

20.
This paper presents a systematic review of published data on the performance of sub-Saharan Africans on Raven's Progressive Matrices. The specific goals were to estimate the average level of performance, to study the Flynn Effect in African samples, and to examine the psychometric meaning of Raven's test scores as measures of general intelligence. Convergent validity of the Raven's tests is found to be relatively poor, although reliability and predictive validity are comparable to western samples. Factor analyses indicate that the Raven's tests are relatively weak indicators of general intelligence among Africans, and often measure additional factors, besides general intelligence. The degree to which Raven's scores of Africans reflect levels of general intelligence is unknown. Average IQ of Africans is approximately 80 when compared to US norms. Raven's scores among African adults have shown secular increases over the years. It is concluded that the Flynn Effect has yet to take hold in sub-Saharan Africa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号