首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical theories of goodness-of-fit tests in structural equation modeling are based on asymptotic distributions of test statistics. When the model includes a large number of variables or the population is not from a multivariate normal distribution, the asymptotic distributions do not approximate the distribution of the test statistics very well at small sample sizes. A variety of methods have been developed to improve the accuracy of hypothesis testing at small sample sizes. However, all these methods have their limitations, specially for nonnormal distributed data. We propose a Monte Carlo test that is able to control Type I error with more accuracy compared to existing approaches in both normal and nonnormally distributed data at small sample sizes. Extensive simulation studies show that the suggested Monte Carlo test has a more accurate observed significance level as compared to other tests with a reasonable power to reject misspecified models.  相似文献   

2.
Fitting a large structural equation modeling (SEM) model with moderate to small sample sizes results in an inflated Type I error rate for the likelihood ratio test statistic under the chi-square reference distribution, known as the model size effect. In this article, we show that the number of observed variables (p) and the number of free parameters (q) have unique effects on the Type I error rate of the likelihood ratio test statistic. In addition, the effects of p and q cannot be fully explained using degrees of freedom (df). We also evaluated the performance of 4 correctional methods for the model size effect, including Bartlett’s (1950), Swain’s (1975), and Yuan’s (2005) corrected statistics, and Yuan, Tian, and Yanagihara’s (2015) empirically corrected statistic. We found that Yuan et al.’s (2015) empirically corrected statistic generally yields the best performance in controlling the Type I error rate when fitting large SEM models.  相似文献   

3.
Robust maximum likelihood (ML) and categorical diagonally weighted least squares (cat-DWLS) estimation have both been proposed for use with categorized and nonnormally distributed data. This study compares results from the 2 methods in terms of parameter estimate and standard error bias, power, and Type I error control, with unadjusted ML and WLS estimation methods included for purposes of comparison. Conditions manipulated include model misspecification, level of asymmetry, level and categorization, sample size, and type and size of the model. Results indicate that cat-DWLS estimation method results in the least parameter estimate and standard error bias under the majority of conditions studied. Cat-DWLS parameter estimates and standard errors were generally the least affected by model misspecification of the estimation methods studied. Robust ML also performed well, yielding relatively unbiased parameter estimates and standard errors. However, both cat-DWLS and robust ML resulted in low power under conditions of high data asymmetry, small sample sizes, and mild model misspecification. For more optimal conditions, power for these estimators was adequate.  相似文献   

4.
5.
This article used the Wald test to evaluate the item‐level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G‐DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A‐CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings.  相似文献   

6.
The asymptotically distribution-free (ADF) test statistic depends on very mild distributional assumptions and is theoretically superior to many other so-called robust tests available in structural equation modeling. The ADF test, however, often leads to model overrejection even at modest sample sizes. To overcome its poor small-sample performance, a family of robust test statistics obtained by modifying the ADF statistics was recently proposed. This study investigates by simulation the performance of the new modified test statistics. The results revealed that although a few of the test statistics adequately controlled Type I error rates in each of the examined conditions, most performed quite poorly. This result underscores the importance of choosing a modified test statistic that performs well for specific examined conditions. A parametric bootstrap method is proposed for identifying such a best-performing modified test statistic. Through further simulation it is shown that the proposed bootstrap approach performs well.  相似文献   

7.
Smoothing is designed to yield smoother equating results that can reduce random equating error without introducing very much systematic error. The main objective of this study is to propose a new statistic and to compare its performance to the performance of the Akaike information criterion and likelihood ratio chi-square difference statistics in selecting the smoothing parameter for polynomial loglinear equating under the random groups design. These model selection statistics were compared for four sample sizes (500, 1,000, 2,000, and 3,000) and eight simulated equating conditions, including both conditions where equating is not needed and conditions where equating is needed. The results suggest that all model selection statistics tend to improve the equating accuracy by reducing the total equating error. The new statistic tended to have less overall error than the other two methods.  相似文献   

8.
Mean and mean-and-variance corrections are the 2 major principles to develop test statistics with violation of conditions. In structural equation modeling (SEM), mean-rescaled and mean-and-variance-adjusted test statistics have been recommended under different contexts. However, recent studies indicated that their Type I error rates vary from 0% to 100% as the number of variables p increases. Can we still trust the 2 principles and what alternative rules can be used to develop test statistics for SEM with “big data”? This article addresses the issues by a large-scale Monte Carlo study. Results indicate that empirical means and standard deviations of each statistic can differ from their expected values many times in standardized units when p is large. Thus, the problems in Type I error control with the 2 statistics are because they do not possess the properties to which they are entitled, not because of the wrongdoing of the mean and mean-and-variance corrections. However, the 2 principles need to be implemented using small sample methodology instead of asymptotics. Results also indicate that distributions other than chi-square might better describe the behavior of test statistics in SEM with big data.  相似文献   

9.
The purpose of this study was to investigate the power and Type I error rate of the likelihood ratio goodness-of-fit (LR) statistic in detecting differential item functioning (DIF) under Samejima's (1969, 1972) graded response model. A multiple-replication Monte Carlo study was utilized in which DIF was modeled in simulated data sets which were then calibrated with MULTILOG (Thissen, 1991) using hierarchically nested item response models. In addition, the power and Type I error rate of the Mantel (1963) approach for detecting DIF in ordered response categories were investigated using the same simulated data, for comparative purposes. The power of both the Mantel and LR procedures was affected by sample size, as expected. The LR procedure lacked the power to consistently detect DIF when it existed in reference/focal groups with sample sizes as small as 500/500. The Mantel procedure maintained control of its Type I error rate and was more powerful than the LR procedure when the comparison group ability distributions were identical and there was a constant DIF pattern. On the other hand, the Mantel procedure lost control of its Type I error rate, whereas the LR procedure did not, when the comparison groups differed in mean ability; and the LR procedure demonstrated a profound power advantage over the Mantel procedure under conditions of balanced DIF in which the comparison group ability distributions were identical. The choice and subsequent use of any procedure requires a thorough understanding of the power and Type I error rates of the procedure under varying conditions of DIF pattern, comparison group ability distributions.–or as a surrogate, observed score distributions–and item characteristics.  相似文献   

10.
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test statistics. A modification to the Satorra–Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the 4 test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies 7 sample sizes and 3 distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ2 test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra–Bentler scaled test statistic performed best overall, whereas the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.  相似文献   

11.
As noted by Fremer and Olson, analysis of answer changes is often used to investigate testing irregularities because the analysis is readily performed and has proven its value in practice. Researchers such as Belov, Sinharay and Johnson, van der Linden and Jeon, van der Linden and Lewis, and Wollack, Cohen, and Eckerly have suggested several statistics for detection of aberrant answer changes. This article suggests a new statistic that is based on the likelihood ratio test. An advantage of the new statistic is that it follows the standard normal distribution under the null hypothesis of no aberrant answer changes. It is demonstrated in a detailed simulation study that the Type I error rate of the new statistic is very close to the nominal level and the power of the new statistic is satisfactory in comparison to those of several existing statistics for detecting aberrant answer changes. The new statistic and several existing statistics were shown to provide useful information for a real data set. Given the increasing interest in analysis of answer changes, the new statistic promises to be useful to measurement practitioners.  相似文献   

12.
The asymptotically distribution free (ADF) method is often used to estimate parameters or test models without a normal distribution assumption on variables, both in covariance structure analysis and in correlation structure analysis. However, little has been done to study the differences in behaviors of the ADF method in covariance versus correlation structure analysis. The behaviors of 3 test statistics frequently used to evaluate structural equation models with nonnormally distributed variables, χ2 test TAGLS and its small-sample variants TYB and TF(AGLS) were compared. Results showed that the ADF method in correlation structure analysis with test statistic TAGLS performs much better at small sample sizes than the corresponding test for covariance structures. In contrast, test statistics TYB and TF(AGLS) under the same conditions generally perform better with covariance structures than with correlation structures. It is proposed that excessively large and variable condition numbers of weight matrices are a cause of poor behavior of ADF test statistics in small samples, and results showed that these condition numbers are systematically increased with substantial increase in variance as sample size decreases. Implications for research and practice are discussed.  相似文献   

13.
回顾国内外有关小样本情况下估计试题的Logistic IRT参数的研究,可以总结出六种参数估计方法,分别是:修改IRT模型法、提供先验信息法、人工神经网络法、非参数估计法、经典测验理论标准化法以及使用数据增强技术。后续研究应加强对已有参数估计方法的改进,使用包括标准误在内的多种误差指标,在250人以内的样本水平上,采用模拟数据与真实数据相结合的模拟实验法开展更加严谨的模拟研究。  相似文献   

14.
This study examined the effect of model size on the chi-square test statistics obtained from ordinal factor analysis models. The performance of six robust chi-square test statistics were compared across various conditions, including number of observed variables (p), number of factors, sample size, model (mis)specification, number of categories, and threshold distribution. Results showed that the unweighted least squares (ULS) robust chi-square statistics generally outperform the diagonally weighted least squares (DWLS) robust chi-square statistics. The ULSM estimator performed the best overall. However, when fitting ordinal factor analysis models with a large number of observed variables and small sample size, the ULSM-based chi-square tests may yield empirical variances that are noticeably larger than the theoretical values and inflated Type I error rates. On the other hand, when the number of observed variables is very large, the mean- and variance-corrected chi-square test statistics (e.g., based on ULSMV and WLSMV) could produce empirical variances conspicuously smaller than the theoretical values and Type I error rates lower than the nominal level, and demonstrate lower power rates to reject misspecified models. Recommendations for applied researchers and future empirical studies involving large models are provided.  相似文献   

15.
The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large‐scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong‐to‐right ACs) used to detect examinees with aberrant ACs capitalize on the uncertainty, which may result in a large Type I error. In this article, the information about ACs is used only for the partitioning of administered items into two disjoint subtests: items where ACs did not occur, and items where ACs did occur. A new statistic is based on the difference in performance between these subtests (measured as Kullback–Leibler divergence between corresponding posteriors of latent traits), where, in order to avoid the uncertainty, only final responses are used. One of the subtests can be filtered such that the asymptotic distribution of the statistic is chi‐square with one degree of freedom. In computer simulations, the presented statistic demonstrated a strong robustness to the uncertainty and higher detection rates in contrast to two popular statistics based on wrong‐to‐right ACs.  相似文献   

16.
A 2-stage robust procedure as well as an R package, rsem, were recently developed for structural equation modeling with nonnormal missing data by Yuan and Zhang (2012). Several test statistics that have been used for complete data analysis are employed to evaluate model fit in the 2-stage robust method. However, properties of these statistics under robust procedures for incomplete nonnormal data analysis have never been studied. This study aims to systematically evaluate and compare 5 test statistics, including a test statistic derived from normal-distribution-based maximum likelihood, a rescaled chi-square statistic, an adjusted chi-square statistic, a corrected residual-based asymptotical distribution-free chi-square statistic, and a residual-based F statistic. These statistics are evaluated under a linear growth curve model by varying 8 factors: population distribution, missing data mechanism, missing data rate, sample size, number of measurement occasions, covariance between the latent intercept and slope, variance of measurement errors, and downweighting rate of the 2-stage robust method. The performance of the test statistics varies and the one derived from the 2-stage normal-distribution-based maximum likelihood performs much worse than the other four. Application of the 2-stage robust method and of the test statistics is illustrated through growth curve analysis of mathematical ability development, using data on the Peabody Individual Achievement Test mathematics assessment from the National Longitudinal Survey of Youth 1997 Cohort.  相似文献   

17.
《教育实用测度》2013,26(4):329-349
The logistic regression (LR) procedure for differential item functioning (DIF) detection is a model-based approach designed to identify both uniform and nonuniform DIF. However, this procedure tends to produce inflated Type I errors. This outcome is problematic because it can result in the inefficient use of testing resources, and it may interfere with the study of the underlying causes of DIF. Recently, an effect size measure was developed for the LR DIF procedure and a classification method was proposed. However, the effect size measure and classification method have not been systematically investigated. In this study, we developed a new classification method based on those established for the Simultaneous Item Bias Test. A simulation study also was conducted to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure across sample sizes, ability distributions, and percentage of DIF items included on a test. The results indicate that the inclusion of the effect size measure can substantially reduce Type I error rates when large sample sizes are used, although there is also a reduction in power.  相似文献   

18.
This study examined the efficacy of 4 different parceling methods for modeling categorical data with 2, 3, and 4 categories and with normal, moderately nonnormal, and severely nonnormal distributions. The parceling methods investigated were isolated parceling in which items were parceled with other items sharing the same source of variance, and distributed parceling in which items were parceled with items influenced by different factors. These parceling strategies were crossed with strategies in which items were either parceled with similarly distributed or differently distributed items, to create 4 different parceling methods. Overall, parceling together items influenced by different factors and with different distributions resulted in better model fit, but high levels of parameter estimate bias. Across all parceling methods, parameter estimate bias ranged from 20% to over 130%. Parceling strategies were contrasted with use of the WLSMV estimator for categorical, unparceled data. Results based on this estimator are encouraging, although some bias was found when high levels of nonnormality were present. Values of the chi-square and root mean squared error of approximation based on WLSMV also resulted in Type II error rates for misspecified models when data were severely nonnormally distributed.  相似文献   

19.
Cluster sampling results in response variable variation both among respondents (i.e., within-cluster or Level 1) and among clusters (i.e., between-cluster or Level 2). Properly modeling within- and between-cluster variation could be of substantive interest in numerous settings, but applied researchers typically test only within-cluster (i.e., individual difference) theories. Specifying a between-cluster model in the absence of theory requires a specification search in multilevel structural equation modeling. This study examined a variety of within-cluster and between-cluster sample sizes, intraclass correlation coefficients, start models, parameter addition and deletion methods, and Type I error control techniques to identify which combination of start model, parameter addition or deletion method, and Type I error control technique best recovered the population of the between-cluster model. Results indicated that a “saturated” start model, univariate parameter deletion technique, and no Type I error control performed best, but recovered the population between-cluster model in less than 1 in 5 attempts at the largest sample sizes. The accuracy of specification search methods, suggestions for applied researchers, and future research directions are discussed.  相似文献   

20.
Classical accounts of maximum likelihood (ML) estimation of structural equation models for continuous outcomes involve normality assumptions: standard errors (SEs) are obtained using the expected information matrix and the goodness of fit of the model is tested using the likelihood ratio (LR) statistic. Satorra and Bentler (1994) introduced SEs and mean adjustments or mean and variance adjustments to the LR statistic (involving also the expected information matrix) that are robust to nonnormality. However, in recent years, SEs obtained using the observed information matrix and alternative test statistics have become available. We investigate what choice of SE and test statistic yields better results using an extensive simulation study. We found that robust SEs computed using the expected information matrix coupled with a mean- and variance-adjusted LR test statistic (i.e., MLMV) is the optimal choice, even with normally distributed data, as it yielded the best combination of accurate SEs and Type I errors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号