首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

Through two studies, this work examined the applicability, interpretability, and construct validity of the Classroom Assessment Scoring System K-3 (CLASS) to measure quality of classroom interactions. In the first study, the CLASS was used in 332 classrooms to test three alternative models (in time order, the one-, three-factor, and two-factor models) to examine its factorial structure. The one-factor model showed worse fit than the other two models. The latent factors of the three-factor model were highly correlated. The bifactor model showed adequate fit. The aim of the second study was to investigate the construct validity of the CLASS. We used data collected from 31 classrooms to examine associations between factors extracted from the bifactor model with outcome variables in the domains of the student-teacher relationship, behavioral problems, and academic achievement. General- and domain-specific factors revealed different patterns of associations with child outcomes. The results are discussed relative to the Italian context.  相似文献   

2.
BackgroundThe Childhood Trauma Questionnaire – Short Form (CTQ-SF) is a widely utilized self-report instrument in the assessment and characterization of childhood trauma. Yet, research on the instrument’s psychometric properties in clinical samples is sparse, and the Danish version of the CTQ-SF has not been previously evaluated in clinical samples.ObjectivesTo examine the structural validity, internal consistency reliability, and multi-method convergent validity of the CTQ-SF in a heterogenous clinical sample from Denmark.Participants and settingThe study was based on data from four Danish clinical samples (N = 393): 1) Outpatients diagnosed with personality disorders, 2) Patients commencing psychiatric treatment for non-affective first-episode psychosis, 3) Patients diagnosed with first-episode or prolonged depression recruited from general practitioners and an outpatient mood disorder clinic, and 4) detained delinquent boys.MethodsConfirmatory factor analysis was used to explore structural validity. Also, we calculated internal consistency and multi-method convergent validity with interview-based ratings of adverse parenting.ResultsConfirmatory factor analyses indicated that the five-factor structure described in CTQ-SF manual with three error correlated items best fitted the data, as compared to various other models. Coefficients of congruence also supported factorial similarity across countries (i.e. US substance abuser and a mixed Brazilian sample). Internal consistency reliability was acceptable and comparable to estimates previously published. Multi-method convergent validity associations further corroborated the validity of the CTQ-SF.ConclusionThese findings provide support for the reliability and validity of the Danish version of the CTQ-SF in clinical samples.  相似文献   

3.
The Progressive Matrices items require varying degrees of analytical reasoning. Individuals high on the underlying trait measured by the Raven should score high on the test. Latent trait models applied to data of the Raven form provide a useful methodology for examining the tenability of the above hypothesis. In this study the Rasch latent model was applied to investigate the fit of observed performance on Raven items to what was expected by the model for individuals at six different levels of the underlying scale. For the most part the model showed a good fit to the test data. The findings were similar to previous empirical work that has investigated the behavior of Rasch test scores. In three instances, however, the item fit statistic was relatively large. A closer study of the “misfitting” items revealed two items were of extreme difficulty, which is likely to contribute to the misfit. The study raises issues about the use of the Rasch model in instances of small samples. Other issues related to the interpretation of the Rasch model to Raven-type data are discussed.  相似文献   

4.
The power of the chi-square test statistic used in structural equation modeling decreases as the absolute value of excess kurtosis of the observed data increases. Excess kurtosis is more likely the smaller the number of item response categories. As a result, fit is likely to improve as the number of item response categories decreases, regardless of the true underlying factor structure or χ2-based fit index used to examine model fit. Equivalently, given a target value of approximate fit (e.g., root mean square error of approximation ≤ .05) a model with more factors is needed to reach it as the number of categories increases. This is true regardless of whether the data are treated as continuous (common factor analysis) or as discrete (ordinal factor analysis). We recommend using a large number of response alternatives (≥ 5) to increase the power to detect incorrect substantive models.  相似文献   

5.
This study presents the reliability and validity of the Teacher Evaluation Experience Scale–Teacher Form (TEES-T), a multidimensional measure of educators' attitudes and beliefs about teacher evaluation. Confirmatory factor analyses of data from 583 teachers were conducted on the TEES-T hypothesized five-factor model, as well as on alternative models. The five- and four-factor model yielded acceptable fit to the data. Information-theory-based indices of relative fit (i.e., AIC0, BCC0, and BIC0) indicated that the TEES-T four-factor model yielded superior fit to either the five-factor or one-factor models. The TEES-T evidenced good internal consistency, freedom from item bias, and convergent validity with the Collective Efficacy Scale. Implications are discussed.  相似文献   

6.
Abstract

The present study compared the performance of six cognitive diagnostic models (CDMs) to explore inter skill relationship in a reading comprehension test. To this end, item responses of about 21,642 test-takers to a high-stakes reading comprehension test were analyzed. The models were compared in terms of model fit at both test and item levels, classification consistency and accuracy, and proportion of skill mastery profiles. The results showed that the G-DINA performed the best and the C-RUM, NC-RUM, and ACDM showed the closest affinity to the G-DINA. In terms of some criteria, the DINA showed comparable performance to the G-DINA. The test-level results were corroborated by the item-level model comparison, where DINA, DINO, and ACDM variously fit some of the items. The results of the study suggested that relationships among the subskills of reading comprehension might be a combination of compensatory and non-compensatory. Therefore, it is suggested that the choice of the CDM be carried out at item level rather than test level.  相似文献   

7.
《教育实用测度》2013,26(2):125-141
Item parameter instability can threaten the validity of inferences about changes in student achievement when using Item Response Theory- (IRT) based test scores obtained on different occasions. This article illustrates a model-testing approach for evaluating the stability of IRT item parameter estimates in a pretest-posttest design. Stability of item parameter estimates was assessed for a random sample of pretest and posttest responses to a 19-item math test. Using MULTILOG (Thissen, 1986), IRT models were estimated in which item parameter estimates were constrained to be equal across samples (reflecting stability) and item parameter estimates were free to vary across samples (reflecting instability). These competing models were then compared statistically in order to test the invariance assumption. The results indicated a moderately high degree of stability in the item parameter estimates for a group of children assessed on two different occasions.  相似文献   

8.
ObjectiveThe Childhood Trauma Questionnaire-Short Form (CTQ-SF) is a self-report questionnaire that retrospectively provides screening for a history of childhood abuse and neglect, and which is widely used throughout the world. The current study aimed to examine the psychometric properties of the Chinese version of the CTQ-SF.MethodsParticipants included 3431 undergraduates from Hunan provinces and 234 depressive patients from psychological clinics. Confirmatory factor analysis was performed to examine how well the original five-factor model fit the data and the measurement equivalence of CTQ-SF across gender. Internal consistency was also evaluated.ResultsThe five-factor model achieved satisfactory fit (Undergraduate sample TLI = 0.925, CFI = 0.936, RMSEA = 0.034, SRMR = 0.046; depressive sample TLI = 0.912, CFI = 0.923, RMSEA = 0.044, SRMR = 0.062). Measurement invariance of the five-factor model across gender was supported fully assuming different degrees of invariance. The CTQ-SF also showed acceptable internal consistency and good stability.ConclusionThe current study provides that the Chinese version of the Childhood Trauma questionnaire-short form has good reliability and validity among Chinese undergraduates and depressive samples, which also indicates that the CTQ-SF is a good tool for child trauma assessment.  相似文献   

9.
Ordinal variables are common in many empirical investigations in the social and behavioral sciences. Researchers often apply the maximum likelihood method to fit structural equation models to ordinal data. This assumes that the observed measures have normal distributions, which is not the case when the variables are ordinal. A better approach is to use polychoric correlations and fit the models using methods such as unweighted least squares (ULS), maximum likelihood (ML), weighted least squares (WLS), or diagonally weighted least squares (DWLS). In this simulation evaluation we study the behavior of these methods in combination with polychoric correlations when the models are misspecified. We also study the effect of model size and number of categories on the parameter estimates, their standard errors, and the common chi-square measures of fit when the models are both correct and misspecified. When used routinely, these methods give consistent parameter estimates but ULS, ML, and DWLS give incorrect standard errors. Correct standard errors can be obtained for these methods by robustification using an estimate of the asymptotic covariance matrix W of the polychoric correlations. When used in this way the methods are here called RULS, RML, and RDWLS.  相似文献   

10.
A major issue in the utilization of covariance structure analysis is model fit evaluation. Recent years have witnessed increasing interest in various test statistics and so-called fit indexes, most of which are actually based on or closely related to F 0, a measure of model fit in the population. This study aims to provide a systematic investigation about the performance of 4 available estimators of F 0. [Fcirc]01 is the conventional estimator and is based on noncentral chi-square approximation. [Fcirc]02 is newly proposed and does not assume noncentral chi-square approximation. [Fcirc]03 and [Fcirc]04 are variations of [Fcirc]02. A Monte Carlo simulation study is conducted to examine how these four estimators of F 0 perform across varying model misspecifications, data distributions, model sizes, and sample sizes. The results show that under normality all 4 quantities estimate F 0 equally well, and under nonnormality [Fcirc]02, [Fcirc]03, and [Fcirc]04 outperform [Fcirc]01. Issues related to these findings are discussed.  相似文献   

11.
Assessing the correctness of a structural equation model is essential to avoid drawing incorrect conclusions from empirical research. In the past, the chi-square test was recommended for assessing the correctness of the model but this test has been criticized because of its sensitivity to sample size. As a reaction, an abundance of fit indexes have been developed. The result of these developments is that structural equation modeling packages are now producing a large list of fit measures. One would think that this progression has led to a clear understanding of evaluating models with respect to model misspecifications. In this article we question the validity of approaches for model evaluation based on overall goodness-of-fit indexes. The argument against such usage is that they do not provide an adequate indication of the “size” of the model's misspecification. That is, they vary dramatically with the values of incidental parameters that are unrelated with the misspecification in the model. This is illustrated using simple but fundamental models. As an alternative method of model evaluation, we suggest using the expected parameter change in combination with the modification index (MI) and the power of the MI test.  相似文献   

12.
Approximations to the distributions of goodness-of-fit indexes in structural equation modeling are derived with the assumption of multivariate normality and slight misspecification of models. The fit indexes considered in this article are Joreskog and Sorbom's goodness-of-fit index (GFI) and the adjusted GFI, McDonald's absolute GFI, Steiger and Lind's root mean squared error of approximation, Steiger's Γ1 and Γ2, Bentler and Bonett's normed fit index, Bollen's incremental fit index and ρ1, Tucker and Lewis's index ρ2, and Bentler's fit index (McDonald and Marsh's relative noncentrality index). An approximation to the asymptotic covariance matrix for the fit indexes is derived by using the delta method. Furthermore, approximations to the densities of the fit indexes are obtained from the transformations of the asymptotically noncentral chi-square distributed variable. A simulation is carried out to confirm the accuracy of the approximations.  相似文献   

13.
14.
Posterior predictive model checking (PPMC) is a Bayesian model checking method that compares the observed data to (plausible) future observations from the posterior predictive distribution. We propose an alternative to PPMC in the context of structural equation modeling, which we term the poor person’s PPMC (PP-PPMC), for the situation wherein one cannot afford (or is unwilling) to draw samples from the full posterior. Using only by-products of likelihood-based estimation (maximum likelihood estimate and information matrix), the PP-PPMC offers a natural method to handle parameter uncertainty in model fit assessment. In particular, a coupling relationship between the classical p values from the model fit chi-square test and the predictive p values from the PP-PPMC method is carefully examined, suggesting that PP-PPMC might offer an alternative, principled approach for model fit assessment. We also illustrate the flexibility of the PP-PPMC approach by applying it to case-influence diagnostics.  相似文献   

15.
Individual person fit analyses provide important information regarding the validity of test score inferences for an individual test taker. In this study, we use data from an undergraduate statistics test (N = 1135) to illustrate a two-step method that researchers and practitioners can use to examine individual person fit. First, person fit is examined numerically with several indices based on the Rasch model (i.e., Infit, Outfit, and Between-Subset statistics). Second, person misfit is presented graphically with person response functions, and these person response functions are interpreted using a heuristic. Individual person fit analysis holds promise for improving score interpretation in that it may detect potential threats to validity of score inferences for some test takers. Individual person fit analysis may also highlight particular subsets of items (on which a test taker performs unexpectedly) that can be used to further contextualize her or his test performance.  相似文献   

16.
Formulas are derived for computing the chi-square statistic from proportions or percentages, both for tests of goodness of fit and association. The advantages of the new formulas are: (1) computation is conceptually more congruent with the hypothesis being tested; (2) interpretation is facilitated (expected frequencies and discrepancies in frequencies are a function of sample size, whereas expected proportions and corresponding discrepancies are not); and (3) computation is facilitated in contingency tables since expected proportions do not need to be determined separately for each cell.  相似文献   

17.
The present study followed a sample of first‐grade (= 316, Mage = 7.05 at first test) through fourth‐grade students to evaluate dynamic developmental relations between vocabulary knowledge and reading comprehension. Using latent change score modeling, competing models were fit to the repeated measurements of vocabulary knowledge and reading comprehension to test for the presence of leading and lagging influences. Univariate models indicated growth in vocabulary knowledge, and reading comprehension was determined by two parts: constant yearly change and change proportional to the previous level of the variable. Bivariate models indicated previous levels of vocabulary knowledge acted as leading indicators of reading comprehension growth, but the reverse relation was not found. Implications for theories of developmental relations between vocabulary and reading comprehension are discussed.  相似文献   

18.
The purpose of the present study was to examine the validity of modeling science achievement in terms of 3 social psychological variables (school connectedness, science attitude, and active learning) and 2 self-perception variables (self-confidence and science value). Two models were tested: full mediation and partial mediation. In the full-mediation model, effects of the 3 social psychological variables upon science achievement were hypothesized to be completely mediated through science value and self-confidence. In the partial-mediation model, however, those 3 variables were hypothesized to affect achievement directly as well as indirectly through the mediating roles of science value and self-confidence. Data were obtained from Grade 8 Saudi students (N = 4,099) who participated in TIMSS 2007. The relationships among constructs were examined with the use of structural equation modeling software Mplus7. Results indicated that both models performed adequately in terms of fit indices, but the partial-mediation model was retained due to its superiority over the full-mediation model in representing the sample covariance matrix as tested through chi-square difference test. The mediating role of self-confidence in the relationships of science attitude and active learning to achievement was substantiated, but the mediating role of science value was not supported.  相似文献   

19.
A number of mental-test theorists have called attention to the fact that increasing test reliability beyond an optimal point can actually lead to a decrement in the validity of that test with respect to a criterion. This non-monotonic relation between reliability and validity has been referred to by Loevinger as the “attentuation paradox,” because Spearman’s correction for attenuation leads one to expect that increasing reliability will always increase validity. In this paper a mathematical link between test reliability and test validity is derived which takes into account the correlation between error scores on a test and error scores on a criterion measure the test is designed to predict. It is proved that when the correlation between these two sets of error scores is positive, the non-monotonic relation between test reliability and test validity which has been viewed as a paradox occurs universally.  相似文献   

20.
This study examined the effect of model size on the chi-square test statistics obtained from ordinal factor analysis models. The performance of six robust chi-square test statistics were compared across various conditions, including number of observed variables (p), number of factors, sample size, model (mis)specification, number of categories, and threshold distribution. Results showed that the unweighted least squares (ULS) robust chi-square statistics generally outperform the diagonally weighted least squares (DWLS) robust chi-square statistics. The ULSM estimator performed the best overall. However, when fitting ordinal factor analysis models with a large number of observed variables and small sample size, the ULSM-based chi-square tests may yield empirical variances that are noticeably larger than the theoretical values and inflated Type I error rates. On the other hand, when the number of observed variables is very large, the mean- and variance-corrected chi-square test statistics (e.g., based on ULSMV and WLSMV) could produce empirical variances conspicuously smaller than the theoretical values and Type I error rates lower than the nominal level, and demonstrate lower power rates to reject misspecified models. Recommendations for applied researchers and future empirical studies involving large models are provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号