首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Rubin’s classic missingness mechanisms are central to handling missing data and minimizing biases that can arise due to missingness. However, the formulaic expressions that posit certain independencies among missing and observed data are difficult to grasp. As a result, applied researchers often rely on informal translations of these assumptions. We present a graphical representation of missing data mechanism, formalized in Mohan, Pearl, and Tian (2013). We show that graphical models provide a tool for comprehending, encoding, and communicating assumptions about the missingness process. Furthermore, we demonstrate on several examples how graph-theoretical criteria can determine if biases due to missing data might emerge in some estimates of interests and which auxiliary variables are needed to control for such biases, given assumptions about the missingness process.  相似文献   

Small samples are common in growth models due to financial and logistical difficulties of following people longitudinally. For similar reasons, longitudinal studies often contain missing data. Though full information maximum likelihood (FIML) is popular to accommodate missing data, the limited number of studies in this area have found that FIML tends to perform poorly with small-sample growth models. This report demonstrates that the fault lies not with how FIML accommodates missingness but rather with maximum likelihood estimation itself. We discuss how the less popular restricted likelihood form of FIML, along with small-sample-appropriate methods, yields trustworthy estimates for growth models with small samples and missing data. That is, previously reported small sample issues with FIML are attributable to finite sample bias of maximum likelihood estimation not direct likelihood. Estimation issues pertinent to joint multiple imputation and predictive mean matching are also included and discussed.  相似文献   

Missing data are common in studies that rely on multiple informant data to evaluate relationships among variables for distinguishable individuals clustered within groups. Estimation of structural equation models using raw data allows for incomplete data, and so all groups can be retained for analysis even if only 1 member of a group contributes data. Statistical inference is based on the assumption that data are missing completely at random or missing at random. Importantly, whether or not data are missing is assumed to be independent of the missing data. A saturated correlates model that incorporates correlates of the missingness or the missing data into an analysis and multiple imputation that might also use such correlates offer advantages over the standard implementation of SEM when data are not missing at random because these approaches could result in a data analysis problem for which the missingness is ignorable. This article considers these approaches in an analysis of family data to assess the sensitivity of parameter estimates and statistical inferences to assumptions about missing data, a strategy that could be easily implemented using SEM software.  相似文献   

As useful multivariate techniques, structural equation models have attracted significant attention from various fields. Most existing statistical methods and software for analyzing structural equation models have been developed based on the assumption that the response variables are normally distributed. Several recently developed methods can partially address violations of this assumption, but still encounter difficulties in analyzing highly nonnormal data. Moreover, the presence of missing data is a practical issue in substantive research. Simply ignoring missing data or improperly treating nonignorable missingness as ignorable could seriously distort statistical influence results. The main objective of this article is to develop a Bayesian approach for analyzing transformation structural equation models with highly nonnormal and missing data. Different types of missingness are discussed and selected via the deviance information criterion. The empirical performance of our method is examined via simulation studies. Application to a study concerning people’s job satisfaction, home life, and work attitude is presented.  相似文献   

Respondent attrition is a common problem in national longitudinal panel surveys. To make full use of the data, weights are provided to account for attrition. Weight adjustments are based on sampling design information and data from the base year; information from subsequent waves is typically not utilized. Alternative methods to address bias from nonresponse are full information maximum likelihood (FIML) or multiple imputation (MI). The effects on bias of growth parameter estimates from using these methods are compared via a simulation study. The results indicate that caution needs to be taken when utilizing panel weights when there is missing data, and to consider methods like FIML and MI, which are not as susceptible to the omission of important auxiliary variables.  相似文献   

Structural equation models are widely appreciated in behavioral, social, and psychological research to model relations between latent constructs and manifest variables, and to control for measurement errors. Most applications of structural equation models are based on fully observed data that are independently distributed. However, hierarchical data with a correlated structure are common in behavioral research, and very often, missing data are encountered. In this article, we propose a 2-level structural equation model for analyzing hierarchical data with missing entries, and describe a Bayesian approach for estimation and model comparison. We show how to use WinBUGS software to get the solution conveniently. The proposed methodologies are illustrated through a simulation study, and a real application in relation to organizational and management research concerning the study of the interrelationships of the latent constructs about job satisfaction, job responsibility, and life satisfaction for citizens in 43 countries.  相似文献   

The shared parameter growth mixture model (SPGMM) has been proposed as a method to handle missing not at random (MNAR) data in longitudinal studies. This Monte Carlo simulation study compared the one-step approach with a three-step approach for adding covariates into the SPGMM. The results showed that performances of one-step and three-step approaches did not differ, but the estimate of the coefficient of the covariate was biased in most conditions with MNAR data. However, means, variances, and covariance of the intercept and slope as well as their standard errors were estimated without bias in most conditions, except for some combinations of small class distances and MNAR dropout missingness that was not related to the underlying growth trajectory. Classification accuracy was similar with both one-step and three-step SPGMM.  相似文献   

Using a sample of schools testing annually in grades 9–11 with a vertically linked series of assessments, a latent growth curve model is used to model test scores with student intercepts and slopes nested within school. Missed assessments can occur because of student mobility, student dropout, absenteeism, and other reasons. Missing data indicators are modeled using logistic regression, with grade 9 and potentially unobserved growth scores used as covariates. Under a hierarchical selection model, estimates of school effects on academic growth and missingness are obtained. The results from the selection model are compared to a model that ignores the missing data process.  相似文献   

Latent class models are often used to assign values to categorical variables that cannot be measured directly. This “imputed” latent variable is then used in further analyses with auxiliary variables. The relationship between the imputed latent variable and auxiliary variables can only be correctly estimated if these auxiliary variables are included in the latent class model. Otherwise, point estimates will be biased. We develop a method that correctly estimates the relationship between an imputed latent variable and external auxiliary variables, by updating the latent variable imputations to be conditional on the external auxiliary variables using a combination of multiple imputation of latent classes and the so-called three-step approach. In contrast with existing “one-step” and “three-step” approaches, our method allows the resulting imputations to be analyzed using the familiar methods favored by substantive researchers.  相似文献   

A 2-stage procedure for estimation and testing of observed measure correlations in the presence of missing data is discussed. The approach uses maximum likelihood for estimation and the false discovery rate concept for correlation testing. The method can be used in initial exploration-oriented empirical studies with missing data, where it is of interest to estimate manifest variable interrelationship indexes and test hypotheses about their population values. The procedure is applicable also with violations of the underlying missing at random assumption, via inclusion of auxiliary variables. The outlined approach is illustrated with data from an aging research study.  相似文献   

Competence data from low‐stakes educational large‐scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item‐level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical approaches such as ignoring missing values or treating them as incorrect are currently applied in many large‐scale studies, while recent model‐based approaches that can account for nonignorable nonresponse have been developed. Estimates of item and person parameters have been demonstrated to be biased for classical approaches when missing data are missing not at random (MNAR). In our study, we focus on parameter estimates of the structural model (i.e., the true regression coefficient when regressing competence on an explanatory variable), simulating data according to various missing data mechanisms. We found that model‐based approaches and ignoring missing values performed well in retrieving regression coefficients even when we induced missing data that were MNAR. Treating missing values as incorrect responses can lead to substantial bias. We demonstrate the validity of our approach empirically and discuss the relevance of our results.  相似文献   

A multiple testing procedure for examining the assumption of normality that is often made in analyses of incomplete data sets is outlined. The method is concerned with testing normality within each missingness pattern and arriving at an overall statement about normality using the available data. The approach is readily applied in empirical research with missing data using the popular software Mplus, Stata, and R. The procedure can be used to ascertain a main assumption underlying frequent applications of maximum likelihood in incomplete data modeling with continuous outcomes. The discussed approach is illustrated with numerical examples.  相似文献   

A well-known ad-hoc approach to conducting structural equation modeling with missing data is to obtain a saturated maximum likelihood (ML) estimate of the population covariance matrix and then to use this estimate in the complete data ML fitting function to obtain parameter estimates. This 2-stage (TS) approach is appealing because it minimizes a familiar function while being only marginally less efficient than the full information ML (FIML) approach. Additional advantages of the TS approach include that it allows for easy incorporation of auxiliary variables and that it is more stable in smaller samples. The main disadvantage is that the standard errors and test statistics provided by the complete data routine will not be correct. Empirical approaches to finding the right corrections for the TS approach have failed to provide unequivocal solutions. In this article, correct standard errors and test statistics for the TS approach with missing completely at random and missing at random normally distributed data are developed and studied. The new TS approach performs well in all conditions, is only marginally less efficient than the FIML approach (and is sometimes more efficient), and has good coverage. Additionally, the residual-based TS statistic outperforms the FIML test statistic in smaller samples. The TS method is thus a viable alternative to FIML, especially in small samples, and its further study is encouraged.  相似文献   

We propose a structural equation model, which reduces to a multidimensional latent class item response theory model, for the analysis of binary item responses with nonignorable missingness. The missingness mechanism is driven by 2 sets of latent variables: one describing the propensity to respond and the other referred to the abilities measured by the test items. These latent variables are assumed to have a discrete distribution, so as to reduce the number of parametric assumptions regarding the latent structure of the model. Individual covariates can also be included through a multinomial logistic parameterization for the distribution of the latent variables. Given the discrete nature of this distribution, the proposed model is efficiently estimated by the expectation–maximization algorithm. A simulation study is performed to evaluate the finite-sample properties of the parameter estimates. Moreover, an application is illustrated with data coming from a student entry test for the admission to some university courses.  相似文献   

Allowance for multiple chances to answer constructed response questions is a prevalent feature in computer‐based homework and exams. We consider the use of item response theory in the estimation of item characteristics and student ability when multiple attempts are allowed but no explicit penalty is deducted for extra tries. This is common practice in online formative assessments, where the number of attempts is often unlimited. In these environments, some students may not always answer‐until‐correct, but may rather terminate a response process after one or more incorrect tries. We contrast the cases of graded and sequential item response models, both unidimensional models which do not explicitly account for factors other than ability. These approaches differ not only in terms of log‐odds assumptions but, importantly, in terms of handling incomplete data. We explore the consequences of model misspecification through a simulation study and with four online homework data sets. Our results suggest that model selection is insensitive for complete data, but quite sensitive to whether missing responses are regarded as informative (of inability) or not (e.g., missing at random). Under realistic conditions, a sequential model with similar parametric degrees of freedom to a graded model can account for more response patterns and outperforms the latter in terms of model fit.  相似文献   

Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their implications for educational research. We illustrate the issues with an educational, longitudinal survey in which missing data was significant, but for which we were able to collect much of these missing data through subsequent data collection. We thus compare methods, that is, step-wise regression (basically ignoring the missing data) and MI models, with the model from the actual enhanced sample. The value of MI is discussed and the risks involved in ignoring missing data are considered. Implications for research practice are discussed.  相似文献   

A didactic discussion of covariance structure modeling in longitudinal studies with missing data is presented. Use of the full-information maximum likelihood method is considered for model fitting, parameter estimation, and hypothesis testing purposes, particularly when interested in patterns of temporal change as well as its covariates and predictors. The approach is illustrated with an application of the popular level-and-shape model to data from a cognitive intervention study of elderly adults.  相似文献   

Using Monte Carlo simulations, this research examined the performance of four missing data methods in SEM under different multivariate distributional conditions. The effects of four independent variables (sample size, missing proportion, distribution shape, and factor loading magnitude) were investigated on six outcome variables: convergence rate, parameter estimate bias, MSE of parameter estimates, standard error coverage, model rejection rate, and model goodness of fit—RMSEA. A three-factor CFA model was used. Findings indicated that FIML outperformed the other methods in MCAR, and MI should be used to increase the plausibility of MAR. SRPI was not comparable to the other three methods in either MCAR or MAR.  相似文献   

When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree‐based IRT framework for modeling responses and omissions jointly, taking into account that test takers as well as items can contribute to the two types of omissions. The proposed framework covers several existing models for missing responses, and many IRTree models can be estimated using standard statistical software. Further, simulated data is used to show that ignoring missing responses is less robust than often considered. Finally, as an illustration of its applicability, the IRTree approach is applied to data from the 2009 PISA reading assessment.  相似文献   

考虑响应变量随机缺失下线性模型响应变量均值的估计问题,分别获得了基于完全观测样本数据、线性回归插补后的"完全样本"和逆概率加权插补后的"完全样本"得到的响应变量均值估计,并证明了其渐近正态性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号