The purpose of this study is to provide guidance on a process for including latent class predictors in regression mixture models. We first examine the performance of current practice for using the 1-step and 3-step approaches where the direct covariate effect on the outcome is omitted. None of the approaches show adequate estimates of model parameters. Given that Step 1 of the 3-step approach shows adequate results in class enumeration, we suggest using an alternative approach: (a) decide the number of latent classes without predictors of latent classes, and (b) bring the latent class predictors into the model with the inclusion of hypothesized direct covariate effects. Our simulations show that this approach leads to good estimates for all model parameters. The proposed approach is demonstrated by using empirical data to examine the differential effects of family resources on students’ academic achievement outcome. Implications of the study are discussed.  相似文献   

In this simulation study, we explored the effect of introducing covariates to a growth mixture model when covariates were also generated by a mixture model. We varied the association between the latent classes underlying the growth trajectories and the covariates, the degree of separation between the latent classes underlying the covariates, the number of covariates included, and amount of missing data in the growth data. We found that adding covariates to the growth mixture model generally hurt class recovery except where the latent classes underlying the growth trajectories and the covariates were the same or very strongly associated, and there was a large degree of separation between the classes underlying the covariates. We found that when covariates were introduced, entropy might no longer be an accurate indicator of the distinctiveness of the growth trajectory classes.  相似文献   

Factor mixture modeling (FMM) has been increasingly used to investigate unobserved population heterogeneity. This study examined the issue of covariate effects with FMM in the context of measurement invariance testing. Specifically, the impact of excluding and misspecifying covariate effects on measurement invariance testing and class enumeration was investigated via Monte Carlo simulations. Data were generated based on FMM models with (1) a zero covariate effect, (2) a covariate effect on the latent class variable, and (3) covariate effects on both the latent class variable and the factor. For each population model, different analysis models that excluded or misspecified covariate effects were fitted. Results highlighted the importance of including proper covariates in measurement invariance testing and evidenced the utility of a model comparison approach in searching for the correct specification of covariate effects and the level of measurement invariance. This approach was demonstrated using an empirical data set. Implications for methodological and applied research are discussed.  相似文献   

The factor mixture model (FMM) uses a hybrid of both categorical and continuous latent variables. The FMM is a good model for the underlying structure of psychopathology because the use of both categorical and continuous latent variables allows the structure to be simultaneously categorical and dimensional. This is useful because both diagnostic class membership and the range of severity within and across diagnostic classes can be modeled concurrently. Although the conceptualization of the FMM has been explained in the literature, the use of the FMM is still not prevalent. One reason is that there is little research about how such models should be applied in practice and, once a well-fitting model is obtained, how it should be interpreted. In this article, the FMM is explored by studying a real data example on conduct disorder. By exploring this example, this article aims to explain the different formulations of the FMM, the various steps in building a FMM, and how to decide between an FMM and alternative models.  相似文献   

Substantively, this study investigates potential heterogeneity in the developmental trajectories of anxiety in adolescence. Methodologically, this study demonstrates the usefulness of general growth mixture analysis (GGMA) in addressing these issues and illustrates the impact of untested invariance assumptions on substantive interpretations. This study relied on data from the Montreal Adolescent Depression Development Project (MADDP), a 4-year follow-up of more than 1,000 adolescents who completed the Beck Anxiety Inventory each year. GGMA models relying on different invariance assumptions were empirically compared. Each of these models converged on a 5-class solution, but yielded different substantive results. The model with class-varying variance–covariance matrices was retained as providing a better fit to the data. These results showed that although elevated levels of anxiety might fluctuate over time, they clearly do not represent a transient phenomenon. This model was then validated in relation to multiple predictors (mostly related to school violence) and outcomes (grade-point average, school dropout, depression, loneliness, and drug-related problems).  相似文献   

The purpose of the current study is to examine the performance of four information criteria (Akaike's information criterion [AIC], corrected AIC [AICC] Bayesian information criterion [BIC], sample-size adjusted BIC [SABIC]) for detecting the correct number of latent classes in the mixture Rasch model through simulations. The simulation study manipulated various class-distinction features (percentages of class-variant items, magnitudes, and patterns of item difficulty differences) and mixing proportions, assuming that a mixture Rasch model with two latent classes was the true model. Unlike previous studies that showed BIC's superiority to other indices, our findings from this study suggested that the four information criteria had differential performance depending on the percentage of class-variant items and the magnitude and pattern of item difficulty differences under a two-class structure. Furthermore, the present study revealed that AICC and SABIC generally performed as good as or better than their counterparts, AIC and BIC, respectively, for the class-class structure with a sample of 3,000.  相似文献   

Parameter recovery was assessed within mixture confirmatory factor analysis across multiple estimator conditions under different simulated levels of mixture class separation. Mixture class separation was defined in the measurement model (through factor loadings) and the structural model (through factor variances). Maximum likelihood (ML) via the EM algorithm was compared to a Markov chain Monte Carlo (MCMC) estimator condition using weak priors and a condition using tight priors. Results indicated that the MCMC weak condition produced the highest bias, particularly with a weak Dirichlet prior for the mixture class proportions. Specifically, the weak Dirichlet prior affected parameter estimates under all mixture class separation conditions, even with moderate and large sample sizes. With little knowledge about parameters, ML/EM should be used over MCMC weak. However, MCMC tight produced the lowest bias under all mixture class separation conditions and should be used if tight and accurate priors can be placed on parameters.  相似文献   

This study is a methodological-substantive synergy, demonstrating the power and flexibility of exploratory structural equation modeling (ESEM) methods that integrate confirmatory and exploratory factor analyses (CFA and EFA), as applied to substantively important questions based on multidimentional students' evaluations of university teaching (SETs). For these data, there is a well established ESEM structure but typical CFA models do not fit the data and substantially inflate correlations among the nine SET factors (median rs = .34 for ESEM, .72 for CFA) in a way that undermines discriminant validity and usefulness as diagnostic feedback. A 13-model taxonomy of ESEM measurement invariance is proposed, showing complete invariance (factor loadings, factor correlations, item uniquenesses, item intercepts, latent means) over multiple groups based on the SETs collected in the first and second halves of a 13-year period. Fully latent ESEM growth models that unconfounded measurement error from communality showed almost no linear or quadratic effects over this 13-year period. Latent multiple indicators multiple causes models showed that relations with background variables (workload/difficulty, class size, prior subject interest, expected grades) were small in size and varied systematically for different ESEM SET factors, supporting their discriminant validity and a construct validity interpretation of the relations. A new approach to higher order ESEM was demonstrated, but was not fully appropriate for these data. Based on ESEM methodology, substantively important questions were addressed that could not be appropriately addressed with a traditional CFA approach.  相似文献   

Stage-sequential (or multiphase) growth mixture models are useful for delineating potentially different growth processes across multiple phases over time and for determining whether latent subgroups exist within a population. These models are increasingly important as social behavioral scientists are interested in better understanding change processes across distinctively different phases, such as before and after an intervention. One of the less understood issues related to the use of growth mixture models is how to decide on the optimal number of latent classes. The performance of several traditionally used information criteria for determining the number of classes is examined through a Monte Carlo simulation study in single- and multiphase growth mixture models. For thorough examination, the simulation was carried out in 2 perspectives: the models and the factors. The simulation in terms of the models was carried out to see the overall performance of the information criteria within and across the models, whereas the simulation in terms of the factors was carried out to see the effect of each simulation factor on the performance of the information criteria holding the other factors constant. The findings not only support that sample size adjusted Bayesian Information Criterion would be a good choice under more realistic conditions, such as low class separation, smaller sample size, or missing data, but also increase understanding of the performance of information criteria in single- and multiphase growth mixture models.  相似文献   

This article proposes a new type of latent class analysis, joint latent class analysis (JLCA), which provides a set of principles for the systematic identification of the subsets of joint patterns for multiple discrete latent variables. Inferences about the parameters are obtained by a hybrid method of expectation-maximization and Newton–Raphson algorithms. We apply JLCA in an investigation of adolescent violent behavior and drug-using behaviors. The data are from 4,957 male high-school students who participated in the Youth Risk Behavior Surveillance System in 2015. The JLCA approach identifies the different joint patterns of 4 latent variables: violent behavior, alcohol consumption, tobacco cigarette smoking, and other drug use. The JLCA uncovers 4 common violent behaviors and 3 representative behavioral patterns for each of 3 other latent variables. In addition, the JLCA supports 3 common joint classes, representing the most probable simultaneous patterns for being violent and being a drug user among adolescent males.  相似文献   

Latent class analysis often aims to relate the classes to continuous external consequences (“distal outcomes”), but estimating such relationships necessitates distributional assumptions. Lanza, Tan, and Bray (2013) suggested circumventing such assumptions with their LTB approach: Linear logistic regression of latent class membership on each distal outcome is first used, after which this estimated relationship is reversed using Bayes’ rule. However, the LTB approach currently has 3 drawbacks, which we address in this article. First, LTB interchanges the assumption of normality for one of homoskedasticity, or, equivalently, of linearity of the logistic regression, leading to bias. Fortunately, we show introducing higher order terms prevents this bias. Second, we improve coverage rates by replacing approximate standard errors with resampling methods. Finally, we introduce a bias-corrected 3-step version of LTB as a practical alternative to standard LTB. The improved LTB methods are validated by a simulation study, and an example application demonstrates their usefulness.  相似文献   

Researchers have devoted some time and effort to developing methods for fitting nonlinear relationships among latent variables. In particular, most of these have focused on correctly modeling interactions between 2 exogenous latent variables, and quadratic relationships between exogenous and endogenous variables. All of these approaches require prespecification of the nonlinearity by the researcher, and are limited to fairly simple nonlinear relationships. Other work has been done using mixture structural equation models (SEMM) in an attempt to fit more complex nonlinear relationships. This study expands on this earlier work by introducing the 2-stage generalized additive model (2SGAM) approach for fitting regression splines in the context of structural equation models. The model is first described and then investigated through the use of simulated data, in which it was compared with the SEMM approach. Results demonstrate that the 2SGAM is an effective tool for fitting a variety of nonlinear relationships between latent variables, and can be easily and accurately extended to models including multiple latent variables. Implications of these results are discussed.  相似文献   

Growth mixture modeling (GMM) has become a more popular statistical method for modeling population heterogeneity in longitudinal data, but the performance characteristics of GMM enumeration indexes in correctly identifying heterogeneous growth trajectories are largely unknown. Few empirical studies have addressed this issue. This study considered both homogeneous (a k = 1 growth trajectory) and heterogeneous (k = 3 different but unobserved growth trajectories) situations, and examined the performance of GMM in correctly identifying the latent trajectories in sample data. Four design conditions were manipulated: (a) sample size, (b) latent trajectory class proportions, (c) shapes of latent growth trajectories, and (d) degree of separation among latent growth trajectories. The findings suggest that, for k = 1 condition (1 homogenous growth trajectory), GMM's performance is reasonable in correctly identifying 1 latent growth trajectory (cf. Type I error control). However, for the k = 3 conditions (3 heterogeneous latent growth trajectories), GMM's general performance is very questionable (cf. Type II error). Different enumeration indexes varied considerably in their respective performances. Comparing the current results with previous GMM studies, the limitations of this study and future GMM enumeration research avenues are all discussed.  相似文献   

Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered.  相似文献   

The 3-step approach has been recently advocated over the simultaneous 1-step approach to model a distal outcome predicted by a latent categorical variable. We generalize the 3-step approach to situations where the distal outcome is predicted by multiple and possibly associated latent categorical variables. Although the simultaneous 1-step approach has been criticized, simulation studies have found that the performance of the two approaches is similar in most situations (Bakk & Vermunt, 2016). This is consistent with our findings for a 2-LV extension when all model assumptions are satisfied. Results also indicate that under various degrees of violation of the normality and conditional independence assumption for the distal outcome and indicators, both approaches are subject to bias but the 3-step approach is less sensitive. The differences in estimates using the two approaches are illustrated in an analysis of the effects of various childhood socioeconomic circumstances on body mass index at age 50.  相似文献   

In longitudinal design, investigating interindividual differences of intraindividual changes enables researchers to better understand the potential variety of development and growth. Although latent growth curve mixture models have been widely used, unstructured finite mixture models (uFMMs) are also useful as a preliminary tool and are expected to be more robust in identifying classes under the influence of possible model misspecifications, which are very common in actual practice. In this study, large-scale simulations were performed in which various normal uFMMs and nonnormal uFMMs were fit to evaluate their utility and the performance of each model selection procedure for estimating the number of classes in longitudinal designs. Results show that normal uFMMs assuming invariance of variance–covariance structures among classes perform better on average. Among model selection procedures, the Calinski–Harabasz statistic, which has a nonparametric nature, performed better on average than information criteria, including the Bayesian information criterion.  相似文献   

This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across the studies, the differences in the estimated factor loadings between the two subgroups, resulting in a meta-analytic summary of the MGCFA effect sizes (MGCFA-ES). The performance of this new approach was examined using a Monte Carlo simulation, where we created 108 conditions by four factors: (1) three levels of item difficulty, (2) four magnitudes of DIF, (3) three levels of sample size, and (4) three types of correlation matrix (tetrachoric, adjusted Pearson, and Pearson). Results indicate that when MGCFA is fitted to tetrachoric correlation matrices, the meta-analytic summary of the MGCFA-ES performed best in terms of bias and mean square error values, 95% confidence interval coverages, empirical standard errors, Type I error rates, and statistical power; and reasonably well with adjusted Pearson correlation matrices. In addition, when tetrachoric correlation matrices are used, a meta-analytic summary of the MGCFA-ES performed well, particularly, under the condition that a high difficulty item with a large DIF was administered to a large sample size. Our result offers an option for synthesizing the magnitude of DIF on a flagged item across studies in practice.  相似文献   

A valuable extension of the single-rating regression discontinuity design (RDD) is a multiple-rating RDD (MRRDD). To date, four main methods have been used to estimate average treatment effects at the multiple treatment frontiers of an MRRDD: the “surface” method, the “frontier” method, the “binding-score” method, and the “fuzzy instrumental variables” method. This article uses a series of simulations to evaluate the relative performance of each of these four methods under a variety of different data-generating models. Focusing on a two-rating RDD (2RRDD), we compare the methods in terms of their bias, precision, and mean squared error when implemented as they most likely would be in practice—using optimal bandwidth selection. We also apply the lessons learned from the simulations to a real-world example that uses data from a study of an English learner reclassification policy. Overall, this article makes valuable contributions to the literature on MRRDDs in that it makes concrete recommendations for choosing among MRRDD estimation methods, for implementing any chosen method using local linear regression, and for providing accurate statistical inferences.  相似文献   

