首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This study investigated the construction and evaluation of an instrument called the Perceptions of Success Inventory for Beginning Teachers (PSI-BT) intended to measure factors documented in research that contribute to beginning teachers’ perceptions of success. The PSI-BT was found to assess the following factors using exploratory factor analysis: (1) Administrative Support, (2) Classroom Climate, (3) Mentor Support, (4) Colleague and Instructional Resource Support, (5) Commitment, and (6) Assignment and Workload. Internal reliability, content validity, and concurrent validity were also measured in the validation process. Our findings suggest that the PSI-BT provides a reliable and valid instrument that can provide schools with valuable feedback to ensure the success of their beginning teachers.  相似文献   

2.
Most discipline-based education researchers (DBERs) were formally trained in the methods of scientific disciplines such as biology, chemistry, and physics, rather than social science disciplines such as psychology and education. As a result, DBERs may have never taken specific courses in the social science research methodology—either quantitative or qualitative—on which their scholarship often relies so heavily. One particular aspect of (quantitative) social science research that differs markedly from disciplines such as biology and chemistry is the instrumentation used to quantify phenomena. In response, this Research Methods essay offers a contemporary social science perspective on test validity and the validation process. The instructional piece explores the concepts of test validity, the validation process, validity evidence, and key threats to validity. The essay also includes an in-depth example of a validity argument and validation approach for a test of student argument analysis. In addition to DBERs, this essay should benefit practitioners (e.g., lab directors, faculty members) in the development, evaluation, and/or selection of instruments for their work assessing students or evaluating pedagogical innovations.  相似文献   

3.
Using data from 179 preterm infants, a neurobehavioral maturity assessment was developed by using a process in which clusters characterized by conceptual coherence and face validity were systematically subjected to statistical analyses designed to test whether they also had high test-retest reliability, statistical cohesion, and developmental validity. The psychometric soundness of the test items was made a precondition for their inclusion into the assessment procedure. Also tested were cluster redundancy, as well as the impact of gestational and conceptional age, and of postbirth influences on the functions tested. 8 dimensions of neurobehavioral functioning were found to be stable with a test-retest reliability of at least .6 or higher on 2 consecutive days, nonredundant and developmentally valid. They were: Active Tone/Motor Vigor, Alertness and Orientation, Excitation Proneness, Inhibition Proneness, Scarf Sign, Popliteal Angle, Maturity of Vestibular Response, and Vigor of Crying.  相似文献   

4.
《教育实用测度》2013,26(3):285-296
Approaches to validation of writing assessments have included comparisons of different direct scoring methods, comparisons of direct and indirect measures, applications of exploratory and confirmatory factor analysis, and experimental studies. This review of validation methods was conceptualized within a framework suggested by Messick (1989), which included five operational components of construct validation-content representativeness, structural fidelity, nomological validity, criterion-related validity, and nomothetic span.  相似文献   

5.
The Student–Teacher Relationship Scale (STRS) is widely used for research in kindergarten and school. The increasing number of applications inside and outside of the U.S. stresses the need to investigate STRS properties, accordingly. The present study used the STRS in German-speaking countries, examining whether (a) the original factor structure is appropriate for a German version, (b) whether applications of a German STRS are invariant across contexts (kindergarten, first and second grade) as well as gender, and (c) whether construct and criterion validity are met. The original STRS was translated into German and filled out by 368 kindergarten and 503 elementary school teachers in Germany and Austria. Observations in kindergartens, student reports in schools, and teacher reports of students’ characteristics served as validity criteria. Results of confirmatory factor analyses (CFAs) did not confirm the original STRS factor structure. Subsequent exploratory factor analyses on training samples resulted in significant item reductions, followed by further CFAs on validation samples. The bootstrapped results yielded an adjusted three-factor model with subscales indicating satisfying alphas and invariance across context and gender. Construct and criterion validity were met for all subscales of the German STRS based on various criteria from both, observations and reports.  相似文献   

6.
Two studies focusing on the development and validation of the Online Self‐Regulated Learning Inventory (OSRLI) were conducted. The OSRLI is a self‐report instrument assessing the human interaction dimension of online self‐regulated learning. It consists of an affect/motivation scale and an interaction strategies scale. In Study 1, exploratory factor analysis of an initial affect/motivation item pool yielded four factors: enjoyment of human interaction, self‐efficacy for interaction with instructors, concern for interaction with students, and self‐efficacy for contributing to the online community. Exploratory factor analysis of an initial learning strategies item pool revealed three factors: writing strategies, responding strategies, and reflection strategies. In Study 2, confirmatory factor analysis was conducted in order to evaluate the stability of multidimensional factor structures. These exploratory and confirmatory factor analyses showed the OSRLI to be statistically moderate in terms of reliability and validity.  相似文献   

7.
If the same constructs embedded in different tests result in parallel or identical score patterns and high intercorrelations, this can be taken as evidence of construct validity. If results do not converge across instruments and/or response formats, this can be taken as evidence of lack of construct validity and/or impurity of the test as indicator of the constructs. In this study two response formats as well as a request for reasons-for-choices of the traditional Cognitive Preference Test (CPT), and an association (open-ended) CPT, were used in order to test for consistency across methods of observation on both the individual and the populational levels. Convergence of results was found to be minimal. None of the hypotheses was confirmed. It was concluded that construct validation of CPT constructs had not yet reached the state of unequivocality necessary for their application in curriculum research.  相似文献   

8.
This article describes the development and validation of an instrument that can be used for content analysis of inquiry-based tasks. According to the theories of educational evaluation and qualities of inquiry, four essential functions that inquiry-based tasks should serve are defined: (1) assisting in the construction of understandings about scientific concepts, (2) providing students opportunities to use inquiry process skills, (3) being conducive to establishing understandings about scientific inquiry, and (4) giving students opportunities to develop higher order thinking skills. An instrument – the Inquiry-Based Tasks Analysis Inventory (ITAI) – was developed to judge whether inquiry-based tasks perform these functions well. To test the reliability and validity of the ITAI, 4 faculty members were invited to use the ITAI to collect data from 53 inquiry-based tasks in the 3 most widely adopted senior secondary biology textbooks in Mainland China. The results indicate that (1) the inter-rater reliability reached 87.7%, (2) the grading criteria have high discriminant validity, (3) the items possess high convergent validity, and (4) the Cronbach’s alpha reliability coefficient reached 0.792. The study concludes that the ITAI is valid and reliable. Because of its solid foundations in theoretical and empirical argumentation, the ITAI is trustworthy.  相似文献   

9.
Test‐taking strategies are important cognitive skills that strongly affect students’ performance in tests. Using appropriate test‐taking strategies improves students’ achievement and grades, improves students’ attitudes toward tests and reduces test anxiety. This results in improving test accuracy and validity. This study aimed at developing a scale to assess students’ test‐taking strategies at university level. The scale developed was passed through several validation procedures that included content, construct and criterion‐related validity. Similarly, scale reliability (internal reliability and stability over time) was assessed through several procedures. Four samples of students (50, 828, 553 and 235) participated by responding to different versions of the scale. The scale developed consists of 31 items distributed into four sub‐scales: Before‐test, Time management, During‐test and After‐test. To the researcher’s knowledge, this is the first comprehensive scale developed to assess test‐taking strategies used by university students.  相似文献   

10.
为探讨作答方式对考生答题过程的影响,验证听写题型的效度,采用即时追述和访谈的研究方法,对36名EFL考生短文听写时的答题过程进行分析。研究表明:虽然四组考生听写的总体得分差异并不显著,但在答题过程中使用母语作答的第三组考生使用了更多的策略,能较好地理解输入的信息和短文大意,并受与构念无关的因素影响较小,而第四组受试在没有语境的情况下,对语义的理解不深刻,使用的策略最少,测验材料没有较好的效度和区分度。另外,四组考生的实际答题表现与听写的指令要求也并不一致,表明指令的合理性存在问题。  相似文献   

11.
The Chinese Early Childhood Environment Rating Scale (trial) (CECERS) is a new instrument for measuring early childhood program quality in the Chinese socio-cultural contexts, based on substantial adaptation from the Early Childhood Environment Rating Scale-Revised Edition (ECERS-R). This paper describes the development and validation process of CECERS. Empirical data were collected from a stratified random sample 178 classrooms, from which a random sample of 1012 children was measured for child development outcomes. Guided by the framework of broad conceptualization of validity and validation as advocated by Messick (1989), evidence in a variety of forms is presented and discussed, including content validity considerations (e.g., measuring socially and culturally relevant domains), measurement reliability considerations (e.g., internal consistency reliability, inter-rater reliability), and measurement validity considerations (concurrent validity, criterion-related validity, internal structure based on exploratory factor analysis). The empirical findings for CECERS compare very favorably with the validation outcomes of ECERS-R. The body of evidence accumulated in the validation process supports the use and interpretation of CECERS scores as quality indicators of early childhood education program in the Chinese social and cultural contexts. Limitations and future directions are also discussed.  相似文献   

12.
This paper presents the development and initial validation of a feedback scale which measures the thoughts and affective reactions of prospective teachers concerning feedback on their teaching experiences. To reach this goal, data from 512 prospective teachers were used to test the internal consistency, exploratory and confirmative factor structure. While exploratory factor analysis was conducted on a random split-half sample of the data to examine the factor structure of the feedback scale items, confirmative factor analysis was conducted in the holdout sample. As a result of these analyses, it has been determined that the scale showed good validity and it has a structure composed of two factors; professional development and anxiety. Also, the reliability of these sub-factors of scale scores was found to be highly reliable. Overall, results suggest that this scale is a valid measurement that should reveal the viewpoints of prospective teachers regarding feedback in the form of observable behaviours for future research.  相似文献   

13.
The scoring process is critical in the validation of tests that rely on constructed responses. Documenting that readers carry out the scoring in ways consistent with the construct and measurement goals is an important aspect of score validity. In this article, rater cognition is approached as a source of support for a validity argument for scores based on constructed responses, whether such scores are to be used on their own or as the basis for other scoring processes, for example, automated scoring.  相似文献   

14.
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative approaches to test fairness, counterfactual reasoning is useful to clarify a potential charge of unfairness: Is it plausible to believe that with an alternative assessment (test or item) or under different test conditions an individual or groups of individuals may have fared better? Beyond comparative questions, fairness can also be framed by moral and ethical choices. A number of ongoing issues are evaluated with respect to these topics including accommodations, differential item functioning (DIF), differential prediction and selection, employment testing, test validation, and classroom assessment.  相似文献   

15.
This study examines the patterns of use and potential impact of individualized, reflective guidance in an educational Multi-User Virtual Environment (MUVE). A guidance system embedded within a MUVE-based scientific inquiry curriculum was implemented with a sample of middle school students in an exploratory study investigating (a) whether access to the guidance system was associated with improved learning, (b) whether students viewing more guidance messages saw greater improvement on content tests than those viewing less, and (c) whether there were any differences in guidance use among boys and girls. Initial experimental findings showed that basic access to individualized guidance used with a MUVE had no measurable impact on learning. However, post-hoc exploratory analyses indicated that increased use of the system among those with access to it was positively associated with content test score gains. In addition, differences were found in overall learning outcomes by gender and in patterns of guidance use by boys and girls, with girls outperforming boys across a spectrum of guidance system use. Based on these exploratory findings, the paper suggests design guidelines for the development of guidance systems embedded in MUVEs and outlines directions for further research.  相似文献   

16.
The objective of this study was to develop a reliable and valid test to ascertain ICT literacy 0 = Information and Communication Technologies Literacy). It was designed as a paper-and-pencil test to ensure that it can be used flexibly. The results of two studies (pilot study: N = 308; validation sample of the final form of the test: N = 263) showed satisfactory item and scale values and indicated the one-dimensional nature of the construct. Tests of the test instruments for convergence and discrimination validity showed the expected links with computer-related pupil characteristics. To supplement this, expert reviews indicated a satisfactory content validity. Analyses of the construct validity showed that ICT literacy was clearly distinguishable from general cognitive abilities and, at the same time, possessed incremental validity in addition to other measures of computer literacy. The findings are discussed in relation to open questions about the construct validity of the test instrument.  相似文献   

17.
Evaluation is an inherent part of education for an increasingly diverse student population. Confidence in one’s test‐taking skills, and the associated testing environment, needs to be examined from a perspective that combines the concept of Bandurian self‐efficacy with the concept of stereotype threat reactions in a diverse student sample. Factors underlying testing reactions and performance on a cognitive ability test in four different testing conditions (high or low stereotype threat and high or low test face validity) were examined in this exploratory study. The stereotype threat manipulation seemed to lower African‐American and Hispanic participants’ test scores. However, the hypothesis that there would be an interaction with face validity was only partially supported. Participants’ highest scores resulted from low stereotype threat and high face validity, as predicted. However, the lowest scores were not in the high stereotype threat/ low face validity condition as expected. Instead, most groups tended to score lower when the test was perceived to be more face valid. Stereotype threat manipulation affected Whites as well as non‐Whites, although differently. Specifically, high stereotype threat increased Whites’ cognitive ability test scores in the low face validity condition, but decreased them in the high face validity condition. Implications for testing and classroom environment design are discussed.  相似文献   

18.
1985年《教育与心理测验标准》(第5版)出版之前,效度研究的核心概念是"效标(criterion)",效度研究被视为一种用"效标"对测验的效度进行证明(verify)、对测验分数做出有效(valid)解释的过程。1985年以后,效度研究的核心概念是"证据(evidence)",效度研究被视为一种通过积累证据对测验的效度进行支持(support)、对测验分数做出合理(reasonable)解释的过程。关于效度的这种理解,突出体现在1999年出版的《教育与心理测验标准》(第6版)中。美国教育协会和美国国家教育测量学会共同组织编写的《教育测量》在业内被称为"教育测量领域的《圣经》"。2006年《教育测量》(第4版)出版以后,效度研究的核心概念演变为"理由(warrant)",效度研究被视为一种通过构造"理由系统"和"理由网络"对效度进行"论证(argument)"、对测验分数做出可接受的(plausible)解释的过程。本文结合笔者的考试实践,介绍了效度概念的新发展。  相似文献   

19.
Humanoid robots equipped with social skills have come to be used increasingly in the field of education across various subfields such as science education, special education, and foreign language education. In order to enhance the use of humanoid robots in educational settings, and to comprehensively evaluate its impact on the transformation of the class, understanding students’ attitudes towards the use of robots for educational purposes plays a critical role. This paper outlines the implementation and validation procedures of an educational robot attitude scale (ERAS) developed to measure the attitudes of secondary school students towards the use of humanoid robots in educational settings. The sample of the study comprised of 232 secondary school students. The development and validation process consisted of exploratory factor analysis and convergent validity. The developed scale consists of 17 items and represents four factors of students’ attitude: engagement, enjoyment, anxiety and intention. These four factors accounted for 66% of the total variance of the scale. Internal consistency coefficient for the whole scale was found .90 according to the reliability analysis. The results of the study suggest that the scale is a valid, reliable, and efficient tool for measuring the dimensions of students’ attitudes towards humanoid robots in educational settings.  相似文献   

20.
Academic competence beliefs have been widely studied. However, conceptual and measurement efforts have not yet been directed toward understanding perceived underachievement (feeling that one's accomplishments fall below perceived capability). We conducted two studies in order to develop and examine validity evidence for the Perceived Academic Underachievement Scale (PAUS). Participants were individuals enrolled for credit in at least one post-secondary course. In Study 1, we evaluated content validity and conducted an exploratory factor analysis. In Study 2, we conducted a confirmatory factor analysis and investigated external validity. For both samples, PAUS demonstrated good internal consistency reliability, and items loaded strongly onto a single factor. PAUS was empirically distinct from a range of related constructs. Findings represent preliminary validation evidence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号