首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The power of analysis of covariance (ANCOVA) and 2 types of randomized block designs were compared as a function of the correlation between the concomitant variable and the outcome measure, the number of groups, the number of participants, and nominal power. ANCOVA had a small but consistent advantage over a randomized block design with 1 participant in each Block × Treatment combination (RB1). At correlations of .3 or greater, ANCOVA was superior to a randomized block design with n participants per Block × Treatment combination (RBn), with increasing differences as the correlation increased. RBn was superior to the other 2 designs only when the correlation was .2 or less. At those levels, however, the randomized group analysis of variance ignoring the concomitant variable was equally powerful. The findings held regardless of sample size, number of groups, or nominal power.  相似文献   

2.
The authors sought to identify through Monte Carlo simulations those conditions for which analysis of covariance (ANCOVA) does not maintain adequate Type I error rates and power. The conditions that were manipulated included assumptions of normality and variance homogeneity, sample size, number of treatment groups, and strength of the covariate-dependent variable relationship. Alternative tests studied were Quade's procedure, Puri and Sen's solution, Burnett and Barr's rank difference scores, Conover and Iman's rank transformation test, Hettmansperger's procedure, and the Puri-Sen-Harwell-Serlin test. For balanced designs, the ANCOVA F test was robust and was often the most powerful test through all sample-size designs and distributional configurations. With unbalanced designs, with variance heterogeneity, and when the largest treatment-group variance was matched with the largest group sample size, the nonparametric alternatives generally outperformed the ANCOVA test. When sample size and variance ratio were inversely coupled, all tests became very liberal; no test maintained adequate control over Type I error.  相似文献   

3.
For the two-way factorial design in analysis of variance, the current article explicates and compares three methods for controlling the Type I error rate for all possible simple interaction contrasts following a statistically significant interaction, including a proposed modification to the Bonferroni procedure that increases the power of statistical tests for deconstructing interaction effects when they are of primary substantive interest. Results indicate the general superiority of the modified Bonferroni procedure over Scheffé and Roy-type procedures, where the Bonferroni and Scheffé procedures have been modified to accommodate the logical implications of a false omnibus interaction null hypothesis. An applied example is provided and considerations for applied researchers are offered.  相似文献   

4.
Models to assess mediation in the pretest–posttest control group design are understudied in the behavioral sciences even though it is the design of choice for evaluating experimental manipulations. The article provides analytical comparisons of the four most commonly used models to estimate the mediated effect in this design: analysis of covariance (ANCOVA), difference score, residualized change score, and cross-sectional model. Each of these models is fitted using a latent change score specification and a simulation study assessed bias, Type I error, power, and confidence interval coverage of the four models. All but the ANCOVA model make stringent assumptions about the stability and cross-lagged relations of the mediator and outcome that might not be plausible in real-world applications. When these assumptions do not hold, Type I error and statistical power results suggest that only the ANCOVA model has good performance. The four models are applied to an empirical example.  相似文献   

5.
This study examined the effect of sample size ratio and model misfit on the Type I error rates and power of the Difficulty Parameter Differences procedure using Winsteps. A unidimensional 30-item test with responses from 130,000 examinees was simulated and four independent variables were manipulated: sample size ratio (20/100/250/500/1000); model fit/misfit (1 PL and 3PLc =. 15 models); impact (no difference/mean differences/variance differences/mean and variance differences); and percentage of items with uniform and nonuniform DIF (0%/10%/20%). In general, the results indicate the importance of ensuring model fit to achieve greater control of Type I error and adequate statistical power. The manipulated variables produced inflated Type I error rates, which were well controlled when a measure of DIF magnitude was applied. Sample size ratio also had an effect on the power of the procedure. The paper discusses the practical implications of these results.  相似文献   

6.
This study investigated the Type I error rate and power of four copying indices, K-index (Holland, 1996), Scrutiny! (Assessment Systems Corporation, 1993), g2 (Frary, Tideman, & Watts, 1977), and ω (Wollack, 1997) using real test data from 20,000 examinees over a 2-year period. The data were divided into three different test lengths (20, 40, and 80 items) and nine different sample sizes (ranging from 50 to 20,000). Four different amounts of answer copying were simulated (10%, 20%, 30%, and 40% of the items) within each condition. The ω index demonstrated the best Type I error control and power in all conditions and at all α levels. Scrutiny! and the K-index were uniformly conservative, and both had poor power to detect true copiers at the small α levels typically used in answer copying detection, whereas g2 was generally too liberal, particularly at small α levels. Some comments on the proper uses of copying indices are provided.  相似文献   

7.
Latent means methods such as multiple-indicator multiple-cause (MIMIC) and structured means modeling (SMM) allow researchers to determine whether or not a significant difference exists between groups' factor means. Strong invariance is typically recommended when interpreting latent mean differences. The extent of the impact of noninvariant intercepts on conclusions made when implementing both MIMIC and SMM methods was the main purpose of this study. The impact of intercept noninvariance on Type I error rates, power, and two model fit indices when using MIMIC and SMM approaches under various conditions were examined. Type I error and power were adversely affected by intercept noninvariance. Although the fit indices did not detect small misspecifications in the form of noninvariant intercepts, one did perform more optimally.  相似文献   

8.
This study examined and compared various statistical methods for detecting individual differences in change. Considering 3 issues including test forms (specific vs. generalized), estimation procedures (constrained vs. unconstrained), and nonnormality, we evaluated 4 variance tests including the specific Wald variance test, the generalized Wald variance test, the specific likelihood ratio (LR) variance test, and the generalized LR variance test under both constrained and unconstrained estimation for both normal and nonnormal data. For the constrained estimation procedure, both the mixture distribution approach and the alpha correction approach were evaluated for their performance in dealing with the boundary problem. To deal with the nonnormality issue, we used the sandwich standard error (SE) estimator for the Wald tests and the Satorra–Bentler scaling correction for the LR tests. Simulation results revealed that testing a variance parameter and the associated covariances (generalized) had higher power than testing the variance solely (specific), unless the true covariances were zero. In addition, the variance tests under constrained estimation outperformed those under unconstrained estimation in terms of higher empirical power and better control of Type I error rates. Among all the studied tests, for both normal and nonnormal data, the robust generalized LR and Wald variance tests with the constrained estimation procedure were generally more powerful and had better Type I error rates for testing variance components than the other tests. Results from the comparisons between specific and generalized variance tests and between constrained and unconstrained estimation were discussed.  相似文献   

9.
We investigated the statistical properties of the K-index (Holland, 1996) that can be used to detect copying behavior on a test. A simulation study was conducted to investigate the applicability of the K-index for small, medium, and large datasets. Furthermore, the Type I error rate and the detection rate of this index were compared with the copying index, ω (Wollack, 1997). Several approximations were used to calculate the K-index. Results showed that all approximations were able to hold the Type I error rates below the nominal level. Results further showed that using ω resulted in higher detection rates than the K-indices for small and medium sample sizes (100 and 500 simulees).  相似文献   

10.
Two simulation studies investigated Type I error performance of two statistical procedures for detecting differential item functioning (DIF): SIBTEST and Mantel-Haenszel (MH). Because MH and SIBTEST are based on asymptotic distributions requiring "large" numbers of examinees, the first study examined Type 1 error for small sample sizes. No significant Type I error inflation occurred for either procedure. Because MH has the potential for Type I error inflation for non-Rasch models, the second study used a markedly non-Rasch test and systematically varied the shape and location of the studied item. When differences in distribution across examinee group of the measured ability were present, both procedures displayed inflated Type 1 error for certain items; MH displayed the greater inflation. Also, both procedures displayed statistically biased estimation of the zero DIF for certain items, though SIBTEST displayed much less than MH. When no latent distributional differences were present, both procedures performed satisfactorily under all conditions.  相似文献   

11.
The authors used Johnson's transformation with approximate test statistics to test the homogeneity of simple linear regression slopes when both xij and xij may have nonnormal distributions and there is Type I heteroscedasticity, Type II heteroscedasticity, or complete heteroscedasticity. The test statistic t was first transformed by Johnson's method for each group to correct the nonnormality and to correct the heteroscedasticity; also an approximate test, such as the Welch test or the DeShon-Alexander test, was applied to test the homogeneity of the regression slopes. Computer simulations showed that the proposed technique can control Type I error rate under various circumstances. Finally, the authors provide an example to demonstrate the calculation.  相似文献   

12.
13.
Abstract

Researchers conducting structural equation modeling analyses rarely, if ever, control for the inflated probability of Type I errors when evaluating the statistical significance of multiple parameters in a model. In this study, the Type I error control, power and true model rates of famsilywise and false discovery rate controlling procedures were compared with rates when no multiplicity control was imposed. The results indicate that Type I error rates become severely inflated with no multiplicity control, but also that familywise error controlling procedures were extremely conservative and had very little power for detecting true relations. False discovery rate controlling procedures provided a compromise between no multiplicity control and strict familywise error control and with large sample sizes provided a high probability of making correct inferences regarding all the parameters in the model.  相似文献   

14.
This Monte Carlo simulation study investigated the impact of nonnormality on estimating and testing mediated effects with the parallel process latent growth model and 3 popular methods for testing the mediated effect (i.e., Sobel’s test, the asymmetric confidence limits, and the bias-corrected bootstrap). It was found that nonnormality had little effect on the estimates of the mediated effect, standard errors, empirical Type I error, and power rates in most conditions. In terms of empirical Type I error and power rates, the bias-corrected bootstrap performed best. Sobel’s test produced very conservative Type I error rates when the estimated mediated effect and standard error had a relationship, but when the relationship was weak or did not exist, the Type I error was closer to the nominal .05 value.  相似文献   

15.
Recent advances in testing mediation have found that certain resampling methods and tests based on the mathematical distribution of 2 normal random variables substantially outperform the traditional z test. However, these studies have primarily focused only on models with a single mediator and 2 component paths. To address this limitation, a simulation was conducted to evaluate these alternative methods in a more complex path model with multiple mediators and indirect paths with 2 and 3 paths. Methods for testing contrasts of 2 effects were evaluated also. The simulation included 1 exogenous independent variable, 3 mediators and 2 outcomes and varied sample size, number of paths in the mediated effects, test used to evaluate effects, effect sizes for each path, and the value of the contrast. Confidence intervals were used to evaluate the power and Type I error rate of each method, and were examined for coverage and bias. The bias-corrected bootstrap had the least biased confidence intervals, greatest power to detect nonzero effects and contrasts, and the most accurate overall Type I error. All tests had less power to detect 3-path effects and more inaccurate Type I error compared to 2-path effects. Confidence intervals were biased for mediated effects, as found in previous studies. Results for contrasts did not vary greatly by test, although resampling approaches had somewhat greater power and might be preferable because of ease of use and flexibility.  相似文献   

16.
Type I error rate and power for the t test, Wilcoxon-Mann-Whitney (U) test, van der Waerden Normal Scores (NS) test, and Welch-Aspin-Satterthwaite (W) test were compared for two independent random samples drawn from nonnormal distributions. Data with varying degrees of skewness (S) and kurtosis (K) were generated using Fleishman's (1978) power function. Five sample size combinations were used with both equal and unequal variances. For nonnormal data with equal variances, the power of the U test exceeded the power of the t test regardless of sample size. When the sample sizes were equal but the variances were unequal, the t test proved to be the most powerful test. When variances and sample sizes were unequal, the W test became the test of choice because it was the only test that maintained its nominal Type I error rate.  相似文献   

17.
The standardized log-likelihood of a response vector (lz) is a popular IRT-based person-fit test statistic for identifying model-misfitting response patterns. Traditional use of lz is overly conservative in detecting aberrance due to its incorrect assumption regarding its theoretical null distribution. This study proposes a method for improving the accuracy of person-fit analysis using lz which takes into account test unreliability when estimating the ability and constructs the distribution for each lz through resampling methods. The Type I error and power (or detection rate) of the proposed method were examined at different test lengths, ability levels, and nominal α levels along with other methods, and power to detect three types of aberrance—cheating, lack of motivation, and speeding—was considered. Results indicate that the proposed method is a viable and promising approach. It has Type I error rates close to the nominal value for most ability levels and reasonably good power.  相似文献   

18.
This paper presents the results of a simulation study to compare the performance of the Mann-Whitney U test, Student?s t test, and the alternate (separate variance) t test for two mutually independent random samples from normal distributions, with both one-tailed and two-tailed alternatives. The estimated probability of a Type I error was controlled (in the sense of being reasonably close to the attainable level) by all three tests when the variances were equal, regardless of the sample sizes. However, it was controlled only by the alternate t test for unequal variances with unequal sample sizes. With equal sample sizes, the probability was controlled by all three tests regardless of the variances. When it was controlled, we also compared the power of these tests and found very little difference. This means that very little power will be lost if the Mann-Whitney U test is used instead of tests that require the assumption of normal distributions.  相似文献   

19.
This article extends the Bonett (2003a) approach to testing the equality of alpha coefficients from two independent samples to the case of m ≥ 2 independent samples. The extended Fisher-Bonett test and its competitor, the Hakstian-Whalen (1976) test, are illustrated with numerical examples of both hypothesis testing and power calculation. Computer simulations are used to compare the performance of the two tests and the Feldt (1969) test (for m = 2) in terms of power and Type I error control. It is shown that the Fisher-Bonett test is just as effective as its competitors in controlling Type I error, is comparable to them in power, and is equally robust against heterogeneity of error variance.  相似文献   

20.
This article used the Wald test to evaluate the item‐level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G‐DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A‐CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号