期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reliability,Dimensionality, and Internal Consistency as Defined by Cronbach: Distinct Albeit Related Concepts

Ernest C. Davenport Mark L. Davison Pey‐Yan Liou Quintin U. Love 《Educational Measurement》2015,34(4):4-9

This article uses definitions provided by Cronbach in his seminal paper for coefficient α to show the concepts of reliability, dimensionality, and internal consistency are distinct but interrelated. The article begins with a critique of the definition of reliability and then explores mathematical properties of Cronbach's α. Internal consistency and dimensionality are then discussed as defined by Cronbach. Next, functional relationships are given that relate reliability, internal consistency, and dimensionality. The article ends with a demonstration of the utility of these concepts as defined. It is recommended that reliability, internal consistency, and dimensionality each be quantified with separate indices, but that their interrelatedness be recognized. High levels of unidimensionality and internal consistency are not necessary for reliability as measured by α nor, more importantly, for interpretability of test scores. 相似文献

2.

Delimiting Coefficient α from Internal Consistency and Unidimensionality

下载免费PDF全文

Klaas Sijtsma 《Educational Measurement》2015,34(4):10-13

I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient α to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient α is a lower bound to reliability and that concepts of internal consistency and unidimensionality, however defined, belong to the realm of validity, viz. the issue of what the test measures. Internal consistency and unidimensionality may play a role in the construction of tests when the theory of the attribute for which the test is constructed implies that the items be internally consistent or unidimensional. I also offer examples of attributes that do not imply internal consistency or unidimensionality, thus limiting these concepts' usefulness in practical applications. 相似文献

3.

Establishing the Initial Validity of the REDFLAGS Model: Implications for College Counselors

Michael T. Kalkbrenner Anna L. Lopez Jessica R. Gibbs 《Journal of College Counseling》2020,23(2):98-112

The aim of this study was to initially validate the REDFLAGS model, 8 cautionary warning signs of mental distress in college students. A test of internal consistency reliability and factor analysis supported the model's reliability and construct validity. Hierarchical logistic regression models endorsed the model's predictive validity; students’ recognition of the REDFLAGS model was significantly associated with increases in the odds of a peer‐to‐peer referral to the counseling center. Implications for college counselors are discussed. 相似文献

4.

Higher Validity in the Face of Lower Reliability: Another Look

《教育实用测度》2013,26(3):249-253

A test segment that lacks content validity with respect to a criterion may be deleted for that reason. At issue is the effect on reliability and validity as measured by the coefficients arising from classical test theory. Assuming that the predictor test has some reasonable degree of internal consistency, deleting a segment of meaningful size is certain to reduce reliability. However, Feldt (1997) showed that a concomitant rise in the validity coefficient may occur under certain limited conditions. The present research further characterizes the circumstances under which validity changes may occur as a result of deletion of a predictor test segment. Specifically, for a positive outcome, one seeks a relatively large correlation between the scores from the deleted segment and the remaining items coupled with a relatively low correlation between scores from the deleted segment and the criterion. 相似文献

5.

教育认知诊断测验的信度和效度研究评述

汪文义 ;宋丽红 ;丁树良《考试研究》2014,(5):29-36

信度和效度是衡量一个测量工具质量的关键指标,教育认知诊断测验中的信度和效度研究近年来受到研究者的关注。诊断测验的信度系数基本上源自基于α系数的属性信度系数、经验属性信度系数、四分相关系数、模拟重测一致性和分类一致性指标;效度系数主要包括模拟判准率、分类准确性和理论构想效度等。教育认知诊断测验的信度和效度研究较新,仍存在着一定的不足且缺乏全面的比较研究,更缺少系统的评价体系。相似文献

6.

Reliability of Essay Type Questions — effect of structuring

Manorama Verma Jugesh Chhatwal Tejinder Singh 《Assessment in Education: Principles, Policy & Practice》1997,4(2):265-270

相似文献

7.

Student evaluation of instruction in higher education: exploring issues of validity and reliability

Jing Zhao Dorinda J. Gallant 《Assessment & Evaluation in Higher Education》2012,37(2):227-235

Many personnel committees at colleges and universities in the USA use student evaluation of faculty instruction to make decisions regarding tenure, promotion, merit pay or faculty professional development. This study examines the construct validity and internal consistency reliability of the student evaluation of instruction (SEI) used at a large mid‐western university in the USA for both administrative and instructional purposes. The sample consisted of 73,500 completed SEIs for undergraduate students who self‐reported as freshman, sophomore, junior or senior. Confirmatory factor analysis via structural equation modelling was used to explore the construct validity of the SEI instrument. The internal consistency of students' ratings was reported to provide reliability evidence. The results of this study showed that the model fits the data for the sample. The significance of this study as well as areas for further research are discussed. 相似文献

8.

Attribute‐Level and Pattern‐Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment

下载免费PDF全文

Wenyi Wang Lihong Song Ping Chen Yaru Meng Shuliang Ding 《Journal of Educational Measurement》2015,52(4):457-476

Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern‐level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet been constructed. This study puts forward a simple approach to estimating the indices at both the attribute and the pattern level through one single test administration. Detailed elaboration is made on how the upper and lower bounds for the attribute‐level accuracy can be derived from the variance of error of the attribute mastery probability estimate. In addition, based on Cui's pattern‐level indices, an alternative approach to estimating the attribute‐level indices is also proposed. Comparative analysis of simulation results indicate that the new indices are very desirable for evaluating test‐retest consistency and correct classification rate. 相似文献

9.

Evaluating construct validity and internal consistency of early childhood individualized family service plans

《Studies in Educational Evaluation》2015

This study presents evidence regarding the construct validity and internal consistency of the IFSP Rating Scale (McWilliam & Jung, 2001), which was designed to rate individualized family service plans (IFSPs) on 12 indicators of family centered practice. Here, the Rasch measurement model is employed to investigate the scale's functioning and fit for both person and item diagnostics of 120 IFSPs that were previously analyzed with a classical test theory approach. Analyses demonstrated scores on the IFSP Rating Scale fit the model well, though additional items could improve the scale's reliability. Implications for applying the Rasch model to improve special education research and practice are discussed. 相似文献

10.

Evaluation of Dimensionality in the Assessment of Internal Consistency Reliability: Coefficient Alpha and Omega Coefficients

下载免费PDF全文

Samuel B. Green Yanyun Yang 《Educational Measurement》2015,34(4):14-20

In the lead article, Davenport, Davison, Liou, & Love demonstrate the relationship among homogeneity, internal consistency, and coefficient alpha, and also distinguish among them. These distinctions are important because too often coefficient alpha—a reliability coefficient—is interpreted as an index of homogeneity or internal consistency. We argue that factor analysis should be conducted before calculating internal consistency estimates of reliability. If factor analysis indicates the assumptions underlying coefficient alpha are met, then it can be reported as a reliability coefficient. However, to the extent that items are multidimensional, alternative internal consistency reliability coefficients should be computed based on the parameter estimates of the factor model. Assuming a bifactor model evidenced good fit, and the measure was designed to assess a single construct, omega hierarchical—the proportion of variance of the total scores due to the general factor—should be presented. Omega—the proportion of variance of the total scores due to all factors—also should be reported in that it represents a more traditional view of reliability, although it is computed within a factor analytic framework. By presenting both these coefficients and potentially other omega coefficients, the reliability results are less likely to be misinterpreted. 相似文献

11.

INITIAL DEVELOPMENT AND VALIDATION OF THE BULLYHARM: THE BULLYING,HARASSMENT, AND AGGRESSION RECEIPT MEASURE

William J. Hall 《Psychology in the schools》2016,53(9):984-1000

This article describes the development and preliminary validation of the Bullying, Harassment, and Aggression Receipt Measure (BullyHARM). The development of the BullyHARM involved a number of steps and methods, including a literature review, expert review, cognitive testing, readability testing, data collection from a large sample, reliability testing, and confirmatory factor analysis. A sample of 275 middle school students was used to examine the psychometric properties and factor structure of the BullyHARM, which consists of 22 items and six subscales: physical bullying, verbal bullying, social/relational bullying, cyber‐bullying, property bullying, and sexual bullying. First‐order and second‐order factor models were evaluated. Results demonstrate that the first‐order factor model had superior fit. Results of reliability testing indicate that the BullyHARM scale and subscales have very good internal consistency reliability. Findings indicate that the BullyHARM has good properties regarding content validation and respondent‐related validation, and is a promising instrument for measuring bullying victimization in school. 相似文献

12.

Approaches to Studying in Higher Education: a comparative study in the South Pacific

John T. E. Richardson Roger Landbeck France Mugler 《教育心理学》1995,15(4):417-432

A short version of the Approaches to Studying Inventory (ASI), commended as a ‘quick and easy’ means of assessing student learning, was administered to two groups of students at the University of the South Pacific. Measures of its internal consistency and test‐retest reliability were comparable with those obtained in European research, but were not wholly satisfactory. Moreover, its factor structure was found to be qualitatively different in this context and constituted by different forms of motivation for studying in higher education. It is concluded that approaches to studying are culture‐specific and, in particular, that one should be cautious about using this version of the ASI in systems of higher education in non‐Western countries. 相似文献

13.

In search of the reliability of a Flemish version of the Knowledge Monitoring Assessment Test

Geraldine Clarebout Jan Elen Patrick Onghena 《Metacognition and Learning》2006,1(2):137-147

Metacognitive skills are widely recognized as an important moderating variable for learning. Many studies have shown that these skills affect students’ learning results. Tobias and Everson (2000) argue that metacognitive skills cannot be effectively applied in absence of accurate knowledge monitoring. Consequently, they constructed a knowledge monitoring assessment test, which is claimed to be a valid test to measure students' knowledge monitoring capacity. In this contribution the reliability of a Flemish version of the KMA test is studied. Two studies are reported on, one with secondary education students and one with freshmen university students. In both studies split half method and Kuder Richardson 20 were used to calculate the internal consistency as a measurement of reliability. Because none of the results showed a good reliability it is suggested that additional efforts are needed to elaborate a reliable instrument. 相似文献

14.

Digital ITEMS Module 1: Reliability in Classical Test Theory

下载免费PDF全文

Charlie Lewis Michael Chajewski André A. Rupp 《Educational Measurement》2018,37(2):71-72

In this ITEMS module, we provide a two‐part introduction to the topic of reliability from the perspective of classical test theory (CTT). In the first part, which is directed primarily at beginning learners, we review and build on the content presented in the original didactic ITEMS article by Traub and Rowley (1991). Specifically, we discuss the notion of reliability as an intuitive everyday concept to lay the foundation for its formalization as a reliability coefficient via the basic CTT model. We then walk through the step‐by‐step computation of key reliability indices and discuss the data collection conditions under which each is most suitable. In the second part, which is directed primarily at intermediary learners, we present a distribution‐centered perspective on the same content. We discuss the associated assumptions of various CTT models ranging from parallel to congeneric, and review how these affect the choice of reliability statistics. Throughout the module, we use a customized Excel workbook with sample data and basic data manipulation functionalities to illustrate the computation of individual statistics and to allow for structured independent exploration. In addition, we provide quiz questions with diagnostic feedback as well as short videos that walk through sample exercises within the workbook. 相似文献

15.

Validation of a method for measuring medical students' critical reflections on professionalism in gross anatomy

Christopher M. Wittich Wojciech Pawlina Richard L. Drake Jason H. Szostek Darcy A. Reed Nirusha Lachman Jennifer M. McBride Jayawant N. Mandrekar Thomas J. Beckman 《Anatomical sciences education》2013,6(4):232-238

Improving professional attitudes and behaviors requires critical self reflection. Research on reflection is necessary to understand professionalism among medical students. The aims of this prospective validation study at the Mayo Medical School and Cleveland Clinic Lerner College of Medicine were: (1) to develop and validate a new instrument for measuring reflection on professionalism, and (2) determine whether learner variables are associated with reflection on the gross anatomy experience. An instrument for assessing reflections on gross anatomy, which was comprised of 12 items structured on five‐point scales, was developed. Factor analysis revealed a three‐dimensional model including low reflection (four items), moderate reflection (five items), and high reflection (three items). Item mean scores ranged from 3.05 to 4.50. The overall mean for all 12 items was 3.91 (SD = 0.52). Internal consistency reliability (Cronbach's α) was satisfactory for individual factors and overall (Factor 1 α = 0.78; Factor 2 α = 0.69; Factor 3 α = 0.70; Overall α = 0.75). Simple linear regression analysis indicated that reflection scores were negatively associated with teamwork peer scores (P = 0.018). The authors report the first validated measurement of medical student reflection on professionalism in gross anatomy. Critical reflection is a recognized component of professionalism and may be important for behavior change. This instrument may be used in future research on professionalism among medical students. Anat Sci Educ 6: 232–238. © 2012 American Association of Anatomists. 相似文献

16.

THE ROLE OF RELIABILITY IN CRITERION-REFERENCED TESTS

MICHAEL T. KANE 《Journal of Educational Measurement》1986,23(3):221-224

In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items. 相似文献

17.

ASSESSING THE APPROACHES TO LEARNING OF NIGERIAN STUDENTS

David Watkins Adebowale Akande 《Assessment & Evaluation in Higher Education》1992,17(1):11-20

The reliability and validity of the Study Process Questionnaire (Biggs, 1987) is investigated for 352 Nigerian undergraduates. The concepts involved in the SPQ are relevant to Nigerian students and the SPQ scales and subscales were found to be of adequate internal consistency reliability for research purposes. This conclusion was further enhanced by the meaningful factor structure of responses to the SPQ subscales found for the Nigerian sample. However, doubt is cast as to the metric equivalence of SPQ scales across cultures making it difficult to Interpret direct cross‐cultural comparisons of mean scale scores. 相似文献

18.

《学习障碍评价量表》修订报告

曾守锤《中国特殊教育》2006,(6):92-96

对Stephen B．McCamey1996年修订完成的《学习障碍评价量表》(学校版)进行了修订。中文版量表共85个项目,包括7个分量表:听、思考、说、阅读、书写／写作、拼写和数学运算。对416名小学二至五年级学生的测量表明:(1)项目的回答模式合理; (2)该量表具有较高的内部一致性系数和重测信度系数;(3)该量表具有较好的结构效度、效标关联效度和内容效度。相似文献

19.

THE IMPROVEMENT OF VERBAL FLUENCY IN THE ELDERLY: THE EFFECTS OF PRACTICE ON THE SET TEST AND AN ALTERNATE FORM

Sarah J. Gregory Ann D.M. Davies Martin G. Binks 《Educational gerontology》2013,39(2-3):139-146

The effects of prior practice on the Set Test (ST), a test of verbal fluency, were examined in a group of 30 relatively able patients aged 62‐94 in a continuing care geriatric ward. Subjects were randomly assigned to one of three groups, a specific prior practice group (ST on Day 1 and again 24 hours later), a nonspecific prior practice group (an alternate form of the ST on Day 1, ST 24 hours later) and a control group (conversation on Day 1, ST 24 hours later). Analysis of ST performance showed no significant difference between the two experimental groups and the control group on day 2, but significant improvements for the two groups from days 1 to 2. The clinical implications of this finding are discussed. Analysis of the psychometric characteristics of the ST and the alternate form showed the tests to have high internal consistency and reliability, although further data are needed before the two forms are regarded as parallel. 相似文献

20.

A Flexible Latent Class Approach to Estimating Test‐Score Reliability

Daniël W. van der Palm L. Andries van der Ark Klaas Sijtsma 《Journal of Educational Measurement》2014,51(4):339-357

The latent class reliability coefficient (LCRC) is improved by using the divisive latent class model instead of the unrestricted latent class model. This results in the divisive latent class reliability coefficient (DLCRC), which unlike LCRC avoids making subjective decisions about the best solution and thus avoids judgment error. A computational study using large numbers of items shows that DLCRC also is faster than LCRC and fast enough for practical purposes. Speed and objectivity render DLCRC superior to LCRC. A decisive feature of DLCRC is that it aims at closely approximating the multivariate distribution of item scores, which might render the method suited when test data are multidimensional. A simulation study focusing on multidimensionality shows that DLCRC in general has little bias relative to the true reliability and is relatively accurate compared to LCRC and classical lower bound methods coefficients α and λ2 and the greatest lower bound. 相似文献