首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The metacognitive self-regulation (MSR) scale is among the most widely used measures of metacognition in educational research. However, the psychometric properties and validity of the scale have not been well established. A series of analyses on a college sample were performed to address this issue. In Study 1, a split-sample exploratory (EFA) and confirmatory factor analysis (CFA) was performed to test the one-factor specification of the MSR scale. Time and study environment (TSE), total study time, and cumulative grade performance average (cGPA) were introduced as outcome variables in a structural equation model (SEM) to examine the factors suggested by the EFA. The results of Study 1 indicated poor one-factor model fit and suggested two and three-factor models provided improved fits of the sample data. Results from the SEM indicated the novel factors from the two and three-factor models had different relationships with the outcome variables than the originally specified one-factor model. In Study 2, a modified one-factor model was introduced that consisted of nine items and was named metacognitive self-regulation revised (MSR-R). Five additional samples were included to replicate the model fit for the revised model specification. Finally, a path analysis was performed to examine the relationship of the MSR-R to variables from Study 1. The results of Study 2 revealed improved psychometric properties and reliability for the MSR-R. An indirect relationship emerged between MSR-R and cGPA through TSE. In conclusion, convincing evidence for replacing the MSR was found and implications of the revised scale for future studies was discussed.  相似文献   

2.
Longitudinal data are often collected in waves in which a participant’s data can be collected at different times within each wave, resulting in sampling-time variation that is unaccounted for when waves are treated as single time points. Little research has been reported on the effects of this temporal imprecision on longitudinal growth-curve modeling. This article describes the results of a simulation study into the effect of sampling-time variation on parameter estimation, model fit, and model comparison with an empirical validation of the model fit and comparison results.  相似文献   

3.
Many mechanistic rules of thumb for evaluating the goodness of fit of structural equation models (SEM) emphasize model parsimony; all other things being equal, a simpler, more parsimonious model with fewer estimated parameters is better than a more complex model Although this is usually good advice, in the present article a heuristic counterexample is demonstrated in which parsimony as typically operationalized in indices of fit may be undesirable. Specifically, in simplex models of longitudinal data, the failure to include correlated uniquenesses relating the same indicators administered on different occasions will typically lead to systematically inflated estimates of stability. Although simplex models with correlated uniquenesses are substantially less parsimonious and may be unacceptable according to mechanistic decision rules that penalize model complexity, it can be argued a priori that these additional parameter estimates should be included. Simulated data . are used to support this claim and to evaluate the behavior of a variety of fit indices and decision rules. The results demonstrate the validity of Bollen and Long’s (1993) conclusion that “test statistics and fit indices are very beneficial, but they are no replacement for sound judgment and substantive expertise” (p. 8).  相似文献   

4.
In the past, several models have been developed for the estimation of the reliability and validity of measurement instruments from multitrait-multimethod (MTMM) experiments. Suggestions have been made for additive, multiplicative and correlated uniqueness models, whereas recently Coenders and Saris (2000) suggested a procedure to test these models against one another. In this article, the different models suggested for the analysis of MTMM matrixes have been compared for their fit to 87 data sets collected in the United States (Andrews, 1984; Rodgers, Andrews, & Herzog, 1992), Austria (Koltringer, 1995), and the Netherlands (Scherpenzeel & Saris, 1997). As most variables are categorical, the analysis has been carried out on the basis of polychoric-polyserial correlation coefficients and of Pearson correlations. The fit of the models based on polychoric correlations is much worse than the fit of models based on product moment correlations, but in both cases a model that assumes additive method effects fits most data sets better than the other models, including the so-called multiplicative models.  相似文献   

5.
A linear latent growth curve mixture model with regime switching is extended in 2 ways. Previously, the matrix of first-order Markov switching probabilities was specified to be time-invariant, regardless of the pair of occasions being considered. The first extension, time-varying transitions, specifies different Markov transition matrices between each pair of occasions. The second extension is second-order time-invariant Markov transition probabilities, such that the probability of switching depends on the states at the 2 previous occasions. The models are implemented using the R package OpenMx, which facilitates data handling, parallel computation, and further model development. It also enables the extraction and display of relative likelihoods for every individual in the sample. The models are illustrated with previously published data on alcohol use observed on 4 occasions as part of the National Longitudinal Survey of Youth, and demonstrate improved fit to the data.  相似文献   

6.
Abstract

Through two studies, this work examined the applicability, interpretability, and construct validity of the Classroom Assessment Scoring System K-3 (CLASS) to measure quality of classroom interactions. In the first study, the CLASS was used in 332 classrooms to test three alternative models (in time order, the one-, three-factor, and two-factor models) to examine its factorial structure. The one-factor model showed worse fit than the other two models. The latent factors of the three-factor model were highly correlated. The bifactor model showed adequate fit. The aim of the second study was to investigate the construct validity of the CLASS. We used data collected from 31 classrooms to examine associations between factors extracted from the bifactor model with outcome variables in the domains of the student-teacher relationship, behavioral problems, and academic achievement. General- and domain-specific factors revealed different patterns of associations with child outcomes. The results are discussed relative to the Italian context.  相似文献   

7.
Because random assignment is not possible in observational studies, estimates of treatment effects might be biased due to selection on observable and unobservable variables. To strengthen causal inference in longitudinal observational studies of multiple treatments, we present 4 latent growth models for propensity score matched groups, and evaluate their performance with a Monte Carlo simulation study. We found that the 4 models performed similarly with respect to model fit, bias of parameter estimates, Type I error, and power to test the treatment effect. To demonstrate a multigroup latent growth model with dummy treatment indicators, we estimated the effect of students changing schools during elementary school years on their reading and mathematics achievement, using data from the Early Childhood Longitudinal Study Kindergarten Cohort.  相似文献   

8.
The study of change is based on the idea that the score or index at each measurement occasion has the same meaning and metric across time. In tests or scales with multiple items, such as those common in the social sciences, there are multiple ways to create such scores. Some options include using raw or sum scores (i.e., sum of item responses or linear transformation thereof), using Rasch-scaled scores provided by the test developers, fitting item response models to the observed item responses and estimating ability or aptitude, and jointly estimating the item response and growth models. We illustrate that this choice can have an impact on the substantive conclusions drawn from the change analysis using longitudinal data from the Applied Problems subtest of the Woodcock–Johnson Psycho-Educational Battery–Revised collected as part of the National Institute of Child Health and Human Development's Study of Early Child Care. Assumptions of the different measurement models, their benefits and limitations, and recommendations are discussed.  相似文献   

9.
Using several data sets, the authors examine the relative performance of the beta binomial model and two other more general strong true score models in estimating several indexes of classification consistency. It is shown that the beta binomial model can provide inadequate fits to raw score distributions compared to more general models. This lack of fit is reflected in differences in decision consistency indexes computed using the beta binomial model and the other models. It is recommended that the adequacy of a model in fitting the data be assessed before the model is used to estimate decision consistency indexes. When the beta binomial model does not fit the data, the more general models discussed here may provide an adequate fit and, in such cases, would be more appropriate for computing decision consistency indexes.  相似文献   

10.
In observed‐score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response theory (IRT) model. The parameters from such a model can be utilized to derive the score probabilities for the tests and these score probabilities may then be used in observed‐score equating. In this study, the asymptotic standard errors of observed‐score equating using score probability vectors from polytomous IRT models are derived using the delta method. The results are applied to the equivalent groups design and the nonequivalent groups design with either chain equating or poststratification equating within the framework of kernel equating. The derivations are presented in a general form and specific formulas for the graded response model and the generalized partial credit model are provided. The asymptotic standard errors are accurate under several simulation conditions relating to sample size, distributional misspecification and, for the nonequivalent groups design, anchor test length.  相似文献   

11.
The present study, based on the construct comparability approach, performs a comparative analysis of general points average for seven courses, using exploratory factor analysis (EFA) and the Partial Credit model (PCM) with a sample of 1398 student subjects (M = 12.5, SD = 0.67) from 8 schools in the province of Alicante (Spain). EFA confirmed a one-factor model which explains 74.44% of the variance. Cronbach’s alpha value for this factor was .94. The PCM supported the one-factor model, and an optimal fit was achieved in all of the courses. The analysis of differential item functioning showed no significant differences in any course. Equitable distribution was observed in the evolution of the difficulty indices along the measurement scale for each course. This type of analysis confirms the measurement of a single latent construct in the different topics analysed, despite addressing various theoretical and procedural contents.  相似文献   

12.
Ordinal variables are common in many empirical investigations in the social and behavioral sciences. Researchers often apply the maximum likelihood method to fit structural equation models to ordinal data. This assumes that the observed measures have normal distributions, which is not the case when the variables are ordinal. A better approach is to use polychoric correlations and fit the models using methods such as unweighted least squares (ULS), maximum likelihood (ML), weighted least squares (WLS), or diagonally weighted least squares (DWLS). In this simulation evaluation we study the behavior of these methods in combination with polychoric correlations when the models are misspecified. We also study the effect of model size and number of categories on the parameter estimates, their standard errors, and the common chi-square measures of fit when the models are both correct and misspecified. When used routinely, these methods give consistent parameter estimates but ULS, ML, and DWLS give incorrect standard errors. Correct standard errors can be obtained for these methods by robustification using an estimate of the asymptotic covariance matrix W of the polychoric correlations. When used in this way the methods are here called RULS, RML, and RDWLS.  相似文献   

13.
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.  相似文献   

14.
This study is a methodological-substantive synergy, demonstrating the power and flexibility of exploratory structural equation modeling (ESEM) methods that integrate confirmatory and exploratory factor analyses (CFA and EFA), as applied to substantively important questions based on multidimentional students' evaluations of university teaching (SETs). For these data, there is a well established ESEM structure but typical CFA models do not fit the data and substantially inflate correlations among the nine SET factors (median rs = .34 for ESEM, .72 for CFA) in a way that undermines discriminant validity and usefulness as diagnostic feedback. A 13-model taxonomy of ESEM measurement invariance is proposed, showing complete invariance (factor loadings, factor correlations, item uniquenesses, item intercepts, latent means) over multiple groups based on the SETs collected in the first and second halves of a 13-year period. Fully latent ESEM growth models that unconfounded measurement error from communality showed almost no linear or quadratic effects over this 13-year period. Latent multiple indicators multiple causes models showed that relations with background variables (workload/difficulty, class size, prior subject interest, expected grades) were small in size and varied systematically for different ESEM SET factors, supporting their discriminant validity and a construct validity interpretation of the relations. A new approach to higher order ESEM was demonstrated, but was not fully appropriate for these data. Based on ESEM methodology, substantively important questions were addressed that could not be appropriately addressed with a traditional CFA approach.  相似文献   

15.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.  相似文献   

16.
In psychological research, available data are often insufficient to estimate item factor analysis (IFA) models using traditional estimation methods, such as maximum likelihood (ML) or limited information estimators. Bayesian estimation with common-sense, moderately informative priors can greatly improve efficiency of parameter estimates and stabilize estimation. There are a variety of methods available to evaluate model fit in a Bayesian framework; however, past work investigating Bayesian model fit assessment for IFA models has assumed flat priors, which have no advantage over ML in limited data settings. In this paper, we evaluated the impact of moderately informative priors on ability to detect model misfit for several candidate indices: posterior predictive checks based on the observed score distribution, leave-one-out cross-validation, and widely available information criterion (WAIC). We found that although Bayesian estimation with moderately informative priors is an excellent aid for estimating challenging IFA models, methods for testing model fit in these circumstances are inadequate.  相似文献   

17.
Marsh and Hau (1996) based the assertion that parsimony is not always desirable when assessing model fit on a particular counterexample drawn from Marsh's previous research. This counterexample is neither general nor valid enough to support such a thesis. More specifically, the counterexample signals an oversight of extant, stochastic models justifying correlated uniquenesses, namely, moving-average and autoregressive moving-average models. Such models provide theoretically plausible motives for a priori specification of error correlations. In fact, when uniquenesses are correlated, stochastic models other than the conventional simplex and quasi-simplex models must be tested before positive identification of the process is possible (Sivo, 1997). In short, exchanging the mechanistic penalties for model complexity for the mechanistic specification of untenable measurement-error covariances offers no solution. Parsimony has not been dismissed based on the argument Marsh and Hau presented concerning longitudinal data.  相似文献   

18.
This study presents the reliability and validity of the Teacher Evaluation Experience Scale–Teacher Form (TEES-T), a multidimensional measure of educators' attitudes and beliefs about teacher evaluation. Confirmatory factor analyses of data from 583 teachers were conducted on the TEES-T hypothesized five-factor model, as well as on alternative models. The five- and four-factor model yielded acceptable fit to the data. Information-theory-based indices of relative fit (i.e., AIC0, BCC0, and BIC0) indicated that the TEES-T four-factor model yielded superior fit to either the five-factor or one-factor models. The TEES-T evidenced good internal consistency, freedom from item bias, and convergent validity with the Collective Efficacy Scale. Implications are discussed.  相似文献   

19.
The classic approach for partitioning and assessing reliability and validity has been through the use of the multitrait-multimethod (MTMM) model. The MTMM approach generally involves 3 different groups (method) evaluating 3 traits. This approach can be reconceptualized for questionnaire evaluation, so that the method becomes 3 different scaling types, which are administered to the same respondents on different occasions to avoid carryover effects. A serious limitation of this MTMM model is that data are required from respondents on at least 3 different occasions, thus placing a heavy burden on the researcher and respondents. Planned incomplete data designs for the purpose of substantially reducing the amount of data required for MTMM models were investigated: 1st, a design that reduces the amount of data collected at the 3rd administration by 22%; and 2nd, a design in which data need only be collected at 2 occasions. The performance of Listwise Deletion, Pairwise Deletion, and the expectation maximization (EM) algorithm at dealing with planned incomplete data are examined through a series of simulations. Results indicate that EM was generally precise and efficient.  相似文献   

20.
In judgmental standard setting procedures (e.g., the Angoff procedure), expert raters establish minimum pass levels (MPLs) for test items, and these MPLs are then combined to generate a passing score for the test. As suggested by Van der Linden (1982), item response theory (IRT) models may be useful in analyzing the results of judgmental standard setting studies. This paper examines three issues relevant to the use of lRT models in analyzing the results of such studies. First, a statistic for examining the fit of MPLs, based on judges' ratings, to an IRT model is suggested. Second, three methods for setting the passing score on a test based on item MPLs are analyzed; these analyses, based on theoretical models rather than empirical comparisons among the three methods, suggest that the traditional approach (i.e., setting the passing score on the test equal to the sum of the item MPLs) does not provide the best results. Third, a simple procedure, based on generalizability theory, for examining the sources of error in estimates of the passing score is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号