首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Currently there is concern among some educators regarding the reliability of criterion-referenced (CR) measures. In this comment, a recent attempt to develop a theory of reliability for CR measures is examined, and some considerations for determining the reliability of CR measures are discussed. Conventional reliability statistics (e.g., coefficient alpha, standard error of measurement) are found appropriate for CR measures satisfying the assumptions of the measurement model underlying classical test theory. For measures with underlying multidimensional traits, conventional reliability statistics may be used at the homogeneous subscale level. When the confidence interval about a student's “below criterion score” includes the criterion, additional evidence about the student should be obtained. Two-stage sequential testing is suggested as one method for acquiring additional evidence.  相似文献   

2.
It is well known that coefficient alpha is an estimate of reliability if its underlying assumptions are met and that it is a lower-bound estimate if the assumption of essential tau equivalency is violated. Very little literature addresses the assumption of uncorrelated errors among items and the effect of violating this assumption on alpha. True score models are proposed that can account for correlated errors. These models allow random measurement errors on earlier items to affect directly or indirectly scores on later items. Coefficient alpha may yield spuriously high estimates of reliability if these true score models reflect item responding. In practice, it is important to differentiate these models from models in which the errors are correlated because 1 or more factors have been left unspecified. If the latter model is an accurate representation of item responding, the assumption of essential tau equivalency is violated and alpha is a lower-bound estimate of reliability.  相似文献   

3.
It is shown that in general the popular coefficient alpha estimator for reliability of multi-component measuring instruments converges almost surely to a quantity that is not equal to the population reliability coefficient. This convergence with probability 1 is a stronger statement than convergence in probability (consistency) and convergence in distribution for the alpha estimator, which have been studied in the past. In the special case of congeneric measures with uncorrelated errors and equal loadings on the common true score, the alpha estimator converges almost surely to the population reliability coefficient that equals population alpha, which implies also its consistency as a reliability estimator. When the loadings are unequal but sufficiently high and similar, the alpha estimator converges almost surely to population alpha that is essentially indistinguishable from the population reliability coefficient, which implies alpha’s approximate consistency then. For the general case, the results entail that the alpha estimator is not a consistent estimator of reliability. The findings add to the critical literature on coefficient alpha in the general case, as well as to the justification of its use as a dependable measuring instrument reliability estimator in special cases and settings resulting under appropriate restrictive conditions, and are illustrated using a numerical example.  相似文献   

4.
In this ITEMS module, we frame the topic of scale reliability within a confirmatory factor analysis and structural equation modeling (SEM) context and address some of the limitations of Cronbach's α. This modeling approach has two major advantages: (1) it allows researchers to make explicit the relation between their items and the latent variables representing the constructs those items intend to measure, and (2) it facilitates a more principled and formal practice of scale reliability evaluation. Specifically, we begin the module by discussing key conceptual and statistical foundations of the classical test theory model and then framing it within an SEM context; we do so first with a single item and then expand this approach to a multi‐item scale. This allows us to set the stage for presenting different measurement structures that might underlie a scale and, more importantly, for assessing and comparing those structures formally within the SEM context. We then make explicit the connection between measurement model parameters and different measures of reliability, emphasizing the challenges and benefits of key measures while ultimately endorsing the flexible McDonald's ω over Cronbach's α. We then demonstrate how to estimate key measures in both a commercial software program (Mplus) and three packages within an open‐source environment (R). In closing, we make recommendations for practitioners about best practices in reliability estimation based on the ideas presented in the module.  相似文献   

5.
6.
A latent variable modeling method for studying maximal reliability of unidimensional multicomponent measuring instruments with correlated errors is outlined. In the presence of correlation between 2 residual terms, the procedure allows one to point and interval estimate the reliability of the linear combination of the scale components that possesses the highest possible reliability coefficient. The approach is readily applicable with popular latent variable modeling software and also provides an alternative scoring rule to the widely used overall sum score for homogeneous psychometric scales. The discussed method is illustrated with a numerical example.  相似文献   

7.
In repeated measure studies with unidimensional scales, measurement invariance, and specificity stability over time, the specificity variance in each instrument component can be identified. This article describes for that setting an improved point and interval estimation procedure for the maximal reliability coefficient associated with a given set of homogeneous measures. The method is developed within the framework of latent variable modeling and can also be readily used in longitudinal studies for improved point and interval estimation of individual measure reliability and scale reliability at each assessment occasion. The procedure is based on empirically testable conditions and is illustrated with an example.  相似文献   

8.
《教育实用测度》2013,26(3):277-286
A necessary and sufficient condition of coefficient alpha as an estimate of test reliability is that the part scores be essentially tau equivalent. This condition implies that the test is unidimensional in the factor analytic sense, and all parts must measure the same unitary trait or ability. This article examines the cruciality of this assumption. It is shown mathematically that the negative bias introduced by multidimensionality is likely to be quite small. Empirical data are cited from a standardized achievement test battery to corroborate this inference. It is concluded that the usual formula for coefficient alpha is quite robust with respect to alpha's cardinal assumption. Where bias is substantial, the stratified version of coefficient alpha can be used to accommodate to multidimensionality.  相似文献   

9.
Numerous researchers have proposed methods for evaluating the quality of rater‐mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many‐facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On the other hand, popular parametric methods for evaluating rating quality are often based on measurement theories such as invariant measurement. However, these methods are based on assumptions and transformations that may not be appropriate for ordinal ratings. In this study, I show how researchers can use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, to evaluate rating quality within the framework of invariant measurement without the use of potentially inappropriate parametric techniques. I use an illustrative analysis of data from a rater‐mediated writing assessment to demonstrate how one can use numeric and graphical indicators from MSA to gather evidence of validity, reliability, and fairness. The results from the analyses suggest that MSA provides a useful framework within which to evaluate rater‐mediated assessments for evidence of validity, reliability, and fairness that can supplement existing popular methods for evaluating ratings.  相似文献   

10.
It is shown that the maximum likelihood estimator of the widely used omega coefficient for reliability of multicomponent measuring instruments converges almost surely to the population reliability coefficient for normal congeneric measures with uncorrelated errors as sample size increases indefinitely. This strong consistency implies convergence in probability (consistency) as well as in distribution for the omega estimator. Strong consistency is also demonstrated for the maximal reliability estimator associated with the optimal linear combination of the instrument components. The findings of this note add (i) to the recommendation to use in the general normality case the omega estimator in empirical research, (ii) to the critical literature on the popular coefficient alpha then, and (iii) to the literature on the properties of the optimal linear combination of observed measures and the maximal reliability estimator.  相似文献   

11.
《教育实用测度》2013,26(4):361-367
The sampling theory for coefficient alpha is well developed and readily accessible in the measurement literature. The theory for the intraclass reliability coefficient, a Spearman-Brown extrapolation of alpha to a single measurement on each examinee, is less widely recognized and less easily cited. This article presents techniques for constructing confidence intervals and testing hypotheses for the intraclass coefficient.  相似文献   

12.
The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient ρ XX as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two classical test theory (CTT) reliability statistics (Cronbach’s alpha and Feldt–Gilmer congeneric coefficients) under various testing conditions that result from manipulating large-scale real data. For the first purpose, two alternative ways of exactly quantifying ρ XX are compared in terms of computational efficiency and statistical usefulness. In addition, the lower and upper bounds for ρ XX are presented in line with the assumptions of essential tau-equivalence and congeneric similarity, respectively. Empirical studies conducted for the second purpose showed across all testing conditions that (1) the IRT reliability coefficient was higher than the CTT reliability statistics; (2) the IRT reliability coefficient was closer to the Feldt–Gilmer coefficient than to the Cronbach’s alpha coefficient; and (3) the alpha coefficient was close to the lower bound of IRT reliability. Some advantages of the IRT approach to estimating test-score reliability over the CTT approaches are discussed in the end.  相似文献   

13.
ABSTRACT

This empirical investigation was aimed at conceptualizing, developing and validating a scale for the measurement of the quality of higher degrees by research (HDR-QUAL). For that purpose, this study specifically measured perceptions of higher degrees by research (HDR) students about the constituents of HDR quality in Pakistani tertiary education institutions. Following the 7-step process of scale development, three studies were conducted in order to develop an initial pool of scale items, establishing proposed scale validity and reliability, and assessing nomological behavior of the proposed scale. The principal component analysis with Varimax rotation method resulted in a 3-factor solution, subsequently proposing a 15-item scale. The model fit indices of measurement and the higher-order model indicated a satisfactory fit to data. Finally, the resultant three factors, i.e. financial assistance, supervisory expertise, and infrastructural support, converged into a unidimensional HDR-QUAL scale that was found positively associated with student satisfaction, thus, confirming nomological validity as well. Important policy measures and directions for future research are proposed at the end.  相似文献   

14.
This study commences a process of developing a scale for the measurement of service quality in higher education in South Africa and also examines the relationship between the measures of service quality on the one hand and some other related variables such as intention to leave the university, trust in management of the university and the overall satisfaction with the university. Using structured questionnaires, survey data was collected from students (n = 391) in two South African universities. Findings indicate that the 52-item measure of service quality in higher education is a multidimensional construct loading on 13 factors with a high reliability coefficient (0.93) and some construct validity. Significant relationships were also found between service quality in HE and other study variables—intention to leave university, trust in management of the university and overall satisfaction with the university. Some further research directions were suggested and policy implications of findings discussed.  相似文献   

15.
本文以中国内地某高校大学生为被试,对学校生活质量量表(quality of school life scale QSLS)的信度和效度进行了检验。结果表明,QSLS在中国内地大学生中测试具有良好的信度和效度,可以作为在中国内地大学生中测量学校生活质量的工具,其总分信度Cronbach'sa系数为0.896,重测信度为0.843,各分量表之间重测信度为0.782-0.859;总分与分量表之间的相关系数为0.653-0.815,各因子之间相关系数为0.269-0.773;运用验证性因子分析,各项指标均达到统计学要求。  相似文献   

16.
The purpose of this study was to develop a teaching quality assessment questionnaire and assess its reliability by using it with a sample of first‐year medical students. Principal components analysis with varimax orthogonal rotation resulted in the development of a 12‐item, two‐component tool, adequate for use in lectures and small‐group sessions. The two components were named ‘curriculum’ and ‘relationship’. The Cronbach coefficient alpha values indicated high reliability and internal consistency. According to the results obtained this teaching quality scale is a reliable measure and may be useful in identifying themes in disciplines and among teachers that may benefit from some professional development. Amongst its advantages is that it can be used with an optical reading tool.  相似文献   

17.
研究者通过访谈调查,编制出80个题目,在737名大学生中进行初测,通过项目鉴别度分析和探索性因素分析,保留了24个题目。24个题目包括安全体质、安全心理、安全技能和安全意识等四个因子,内部一致性系数均大于0.73。在571名大学生中进行了复测,该量表的内部一致性系数为0.88;从复测的大学生中随机抽取51人进行重测,重测信度在0.66~0.85之间。验证性因素分析验证了量表与构想模型拟合较好,具有很好的结构效度。最后,研究者进行了大学生公共安全素质量表的结构分析、信度和效度分析,并讨论了存在的问题。  相似文献   

18.
Reliability can be estimated using structural equation modeling (SEM). Two potential problems with this approach are that estimates may be unstable with small sample sizes and biased with misspecified models. A Monte Carlo study was conducted to investigate the quality of SEM estimates of reliability by themselves and relative to coefficient alpha. The SEM approach showed minimal bias when the model was correctly specified if items were relatively well defined by their underlying factor(s). They tended to demonstrate somewhat greater bias when the model was misspecified, particularly underspecified. Overall, SEM estimates were more stable than anticipated. Researchers are more likely to obtain accurate estimates of reliability using SEM by conducting large-sample studies with well-constructed scales and critically assessing model fit.  相似文献   

19.
Abstract

This study concerns the initial development of a scale to measure teachers' attitudes toward teaching as a profession.

A modification of the W-technique, a combination of the equal-appearing interval and paired-comparison methods, was utilized to construct the scales. Teachers were used to judge the statements in the preliminary stages of scaling. An alternate scale was constructed for correlational purposes and later a revised scale was also developed from the same statements.

An estimate of test-retest reliability was obtained. In one college class a correlation coefficient of .92 was obtained between the original and alternate scales. In another class, the test-retest coefficient for the original scale was .99 and .97 between the alternate and original scales.

The high correlation gives evidence for the reliability of the original scale and both scales appear ready for further research purposes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号