首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study deals with the statistical properties of a randomization test applied to an ABAB design in cases where the desirable random assignment of the points of change in phase is not possible. To obtain information about each possible data division, the authors carried out a conditional Monte Carlo simulation with 100,000 samples for each systematically chosen triplet. The authors studied robustness and power under several experimental conditions—different autocorrelation levels and different effect sizes as well as different phase lengths determined by the points of change. Type I error rates were distorted by the presence of autocorrelation for the majority of data divisions. The authors obtained satisfactory Type II error rates only for large treatment effects. The relation between the lengths of the four phases appeared to be an important factor for the robustness and power of the randomization test.  相似文献   

2.
Statistical power was estimated for 3 randomization tests used with multiple-baseline designs. In 1 test, participants were randomly assigned to baseline conditions; in the 2nd, intervention points were randomly assigned; and in the 3rd, the authors used both forms of random assignment. Power was studied for several series lengths (N = 10, 20, 30), several effect sizes (d = 0, 0.5, 1.0, 1.5, 2.0), and several levels of autocorrelation among the errors (p 1 = 0, .1, .2, .3, .4, and .5). Power was found to be similar among the 3 tests. Power was low for effect sizes of 0.5 and 1.0 but was often adequate (> .80) for effect sizes of 1.5 and 2.0.  相似文献   

3.
The authors present a method that ensures control over the Type I error rate for those who visually analyze the data from response-guided multiple-baseline designs. The method can be seen as a modification of visual analysis methods to incorporate a mechanism to control Type I errors or as a modification of randomization test methods to allow response-guided experimentation and visual analysis. The approach uses random assignment of participants to intervention times and a data analyst who is blind to which participants enter treatment at which points in time. The authors provide an example to illustrate the method and discuss the conditions necessary to ensure Type I error control.  相似文献   

4.
Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in the context of a MANOVA, with the typical default for dealing with missing data: listwise deletion. When data are missing at random, the new methods maintained the nominal Type I error rate and had power comparable to the complete data condition. When 40% of the data were missing completely at random, the Type I error rates for the new methods were inflated, but not for lower percents.  相似文献   

5.
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling errors, in turn, results in unanticipated instability in the testing program and an increase in Type I errors in significance tests. This is especially true when the standard error of equating is underestimated. The problem is caused by the typical district and state practice of using nonprobability cluster-sampling procedures, such as convenience, purposeful, and quota sampling, then calculating statistics and standard errors as if the samples were simple random samples.  相似文献   

6.
The authors compared the Type I error rate and the power to detect differences in slopes and additive treatment effects of analysis of covariance (ANCOVA) and randomized block (RB) designs with a Monte Carlo simulation. For testing differences in slopes, 3 methods were compared: the test of slopes from ANCOVA, the omnibus Block × Treatment interaction, and the linear component of the Block × Treatment interaction of RB. In the test for adjusted means, 2 variations of both ANCOVA and RB were used. The power of the omnibus test of the interaction decreased dramatically as the number of blocks used increased and was always considerably smaller than the specific test of differences in slopes found in ANCOVA. Tests for means when there were concomitant differences in slopes showed that only ANCOVA uniformly controlled Type I error under all configurations of design variables. The most powerful option in almost all simulations for tests of both slopes and means was ANCOVA.  相似文献   

7.
When data for multiple outcomes are collected in a multilevel design, researchers can select a univariate or multivariate analysis to examine group-mean differences. When correlated outcomes are incomplete, a multivariate multilevel model (MVMM) may provide greater power than univariate multilevel models (MLMs). For a two-group multilevel design with two correlated outcomes, a simulation study was conducted to compare the performance of MVMM to MLMs. The results showed that MVMM and MLM performed similarly when data were complete or missing completely at random. However, when outcome data were missing at random, MVMM continued to provide unbiased estimates, whereas MLM produced grossly biased estimates and severely inflated Type I error rates. As such, this study provides further support for using MVMM rather than univariate analyses, particularly when outcome data are incomplete.  相似文献   

8.
This article reports on a Monte Carlo simulation study, evaluating two approaches for testing the intervention effect in replicated randomized AB designs: two-level hierarchical linear modeling (HLM) and using the additive method to combine randomization test p values (RTcombiP). Four factors were manipulated: mean intervention effect, number of cases included in a study, number of measurement occasions for each case, and between-case variance. Under the simulated conditions, Type I error rate was under control at the nominal 5% level for both HLM and RTcombiP. Furthermore, for both procedures, a larger number of combined cases resulted in higher statistical power, with many realistic conditions reaching statistical power of 80% or higher. Smaller values for the between-case variance resulted in higher power for HLM. A larger number of data points resulted in higher power for RTcombiP.  相似文献   

9.
When structural equation modeling (SEM) analyses are conducted, significance tests for all important model relationships (parameters including factor loadings, covariances, etc.) are typically conducted at a specified nominal Type I error rate (α). Despite the fact that many significance tests are often conducted in SEM, rarely is multiplicity control applied. Cribbie (2000, 2007) demonstrated that without some form of adjustment, the familywise Type I error rate can become severely inflated. Cribbie also confirmed that the popular Bonferroni method was overly conservative due to the correlations among the parameters in the model. The purpose of this study was to compare the Type I error rates and per-parameter power of traditional multiplicity strategies with those of adjusted Bonferroni procedures that incorporate not only the number of tests in a family, but also the degree of correlation between parameters. The adjusted Bonferroni procedures were found to produce per-parameter power rates higher than the original Bonferroni procedure without inflating the familywise error rate.  相似文献   

10.
Recent advances in testing mediation have found that certain resampling methods and tests based on the mathematical distribution of 2 normal random variables substantially outperform the traditional z test. However, these studies have primarily focused only on models with a single mediator and 2 component paths. To address this limitation, a simulation was conducted to evaluate these alternative methods in a more complex path model with multiple mediators and indirect paths with 2 and 3 paths. Methods for testing contrasts of 2 effects were evaluated also. The simulation included 1 exogenous independent variable, 3 mediators and 2 outcomes and varied sample size, number of paths in the mediated effects, test used to evaluate effects, effect sizes for each path, and the value of the contrast. Confidence intervals were used to evaluate the power and Type I error rate of each method, and were examined for coverage and bias. The bias-corrected bootstrap had the least biased confidence intervals, greatest power to detect nonzero effects and contrasts, and the most accurate overall Type I error. All tests had less power to detect 3-path effects and more inaccurate Type I error compared to 2-path effects. Confidence intervals were biased for mediated effects, as found in previous studies. Results for contrasts did not vary greatly by test, although resampling approaches had somewhat greater power and might be preferable because of ease of use and flexibility.  相似文献   

11.
For the two-way factorial design in analysis of variance, the current article explicates and compares three methods for controlling the Type I error rate for all possible simple interaction contrasts following a statistically significant interaction, including a proposed modification to the Bonferroni procedure that increases the power of statistical tests for deconstructing interaction effects when they are of primary substantive interest. Results indicate the general superiority of the modified Bonferroni procedure over Scheffé and Roy-type procedures, where the Bonferroni and Scheffé procedures have been modified to accommodate the logical implications of a false omnibus interaction null hypothesis. An applied example is provided and considerations for applied researchers are offered.  相似文献   

12.
Testing factorial invariance has recently gained more attention in different social science disciplines. Nevertheless, when examining factorial invariance, it is generally assumed that the observations are independent of each other, which might not be always true. In this study, we examined the impact of testing factorial invariance in multilevel data, especially when the dependency issue is not taken into account. We considered a set of design factors, including number of clusters, cluster size, and intraclass correlation (ICC) at different levels. The simulation results showed that the test of factorial invariance became more liberal (or had inflated Type I error rate) in terms of rejecting the null hypothesis of invariance held between groups when the dependency was not considered in the analysis. Additionally, the magnitude of the inflation in the Type I error rate was a function of both ICC and cluster size. Implications of the findings and limitations are discussed.  相似文献   

13.
The present study examined the effects of manipulation of two graphing conventions on judgements of time‐series data by novice raters. These conventions involved the presence of phase change lines between baseline and intervention data and whether data points across phase changes were connected. The 1990 study of Matyas and Greenwood was also partially replicated and participant error rates were examined when responses were on a non‐binary scale (no effect, uncertain, clear effect) in contrast to the binary scale used in the original study. Thirty postgraduate special education students rated intervention effects on 36 graphs. There was no substantive evidence that graphing conventions affected judgements. Type I errors, defined as a response of ‘clear effect’ when no intervention effect was present, were very low (0–7%) and Type II errors were correspondingly high (0–100%), particularly with low intervention effects and high random error. Thus, judges were very conservative when using a non‐binary response scale, in contrast with the results of Matyas and Greenwood. Several directions for further research are proposed.  相似文献   

14.
This paper presents the results of a simulation study to compare the performance of the Mann-Whitney U test, Student?s t test, and the alternate (separate variance) t test for two mutually independent random samples from normal distributions, with both one-tailed and two-tailed alternatives. The estimated probability of a Type I error was controlled (in the sense of being reasonably close to the attainable level) by all three tests when the variances were equal, regardless of the sample sizes. However, it was controlled only by the alternate t test for unequal variances with unequal sample sizes. With equal sample sizes, the probability was controlled by all three tests regardless of the variances. When it was controlled, we also compared the power of these tests and found very little difference. This means that very little power will be lost if the Mann-Whitney U test is used instead of tests that require the assumption of normal distributions.  相似文献   

15.
Models to assess mediation in the pretest–posttest control group design are understudied in the behavioral sciences even though it is the design of choice for evaluating experimental manipulations. The article provides analytical comparisons of the four most commonly used models to estimate the mediated effect in this design: analysis of covariance (ANCOVA), difference score, residualized change score, and cross-sectional model. Each of these models is fitted using a latent change score specification and a simulation study assessed bias, Type I error, power, and confidence interval coverage of the four models. All but the ANCOVA model make stringent assumptions about the stability and cross-lagged relations of the mediator and outcome that might not be plausible in real-world applications. When these assumptions do not hold, Type I error and statistical power results suggest that only the ANCOVA model has good performance. The four models are applied to an empirical example.  相似文献   

16.
In this study, the effectiveness of detection of differential item functioning (DIF) and testlet DIF using SIBTEST and Poly-SIBTEST were examined in tests composed of testlets. An example using data from a reading comprehension test showed that results from SIBTEST and Poly-SIBTEST were not completely consistent in the detection of DIF and testlet DIF. Results from a simulation study indicated that SIBTEST appeared to maintain type I error control for most conditions, except in some instances in which the magnitude of simulated DIF tended to increase. This same pattern was present for the Poly-SIBTEST results, although Poly-SIBTEST demonstrated markedly less control of type I errors. Type I error control with Poly-SIBTEST was lower for those conditions for which the ability was unmatched to test difficulty. The power results for SIBTEST were not adversely affected, when the size and percent of simulated DIF increased. Although Poly-SIBTEST failed to control type I errors in over 85% of the conditions simulated, in those conditions for which type I error control was maintained, Poly-SIBTEST demonstrated higher power than SIBTEST.  相似文献   

17.
The authors sought to identify through Monte Carlo simulations those conditions for which analysis of covariance (ANCOVA) does not maintain adequate Type I error rates and power. The conditions that were manipulated included assumptions of normality and variance homogeneity, sample size, number of treatment groups, and strength of the covariate-dependent variable relationship. Alternative tests studied were Quade's procedure, Puri and Sen's solution, Burnett and Barr's rank difference scores, Conover and Iman's rank transformation test, Hettmansperger's procedure, and the Puri-Sen-Harwell-Serlin test. For balanced designs, the ANCOVA F test was robust and was often the most powerful test through all sample-size designs and distributional configurations. With unbalanced designs, with variance heterogeneity, and when the largest treatment-group variance was matched with the largest group sample size, the nonparametric alternatives generally outperformed the ANCOVA test. When sample size and variance ratio were inversely coupled, all tests became very liberal; no test maintained adequate control over Type I error.  相似文献   

18.
This Monte Carlo simulation study investigated the impact of nonnormality on estimating and testing mediated effects with the parallel process latent growth model and 3 popular methods for testing the mediated effect (i.e., Sobel’s test, the asymmetric confidence limits, and the bias-corrected bootstrap). It was found that nonnormality had little effect on the estimates of the mediated effect, standard errors, empirical Type I error, and power rates in most conditions. In terms of empirical Type I error and power rates, the bias-corrected bootstrap performed best. Sobel’s test produced very conservative Type I error rates when the estimated mediated effect and standard error had a relationship, but when the relationship was weak or did not exist, the Type I error was closer to the nominal .05 value.  相似文献   

19.
Type I error rate and power for the t test, Wilcoxon-Mann-Whitney (U) test, van der Waerden Normal Scores (NS) test, and Welch-Aspin-Satterthwaite (W) test were compared for two independent random samples drawn from nonnormal distributions. Data with varying degrees of skewness (S) and kurtosis (K) were generated using Fleishman's (1978) power function. Five sample size combinations were used with both equal and unequal variances. For nonnormal data with equal variances, the power of the U test exceeded the power of the t test regardless of sample size. When the sample sizes were equal but the variances were unequal, the t test proved to be the most powerful test. When variances and sample sizes were unequal, the W test became the test of choice because it was the only test that maintained its nominal Type I error rate.  相似文献   

20.
Two simulation studies investigated Type I error performance of two statistical procedures for detecting differential item functioning (DIF): SIBTEST and Mantel-Haenszel (MH). Because MH and SIBTEST are based on asymptotic distributions requiring "large" numbers of examinees, the first study examined Type 1 error for small sample sizes. No significant Type I error inflation occurred for either procedure. Because MH has the potential for Type I error inflation for non-Rasch models, the second study used a markedly non-Rasch test and systematically varied the shape and location of the studied item. When differences in distribution across examinee group of the measured ability were present, both procedures displayed inflated Type 1 error for certain items; MH displayed the greater inflation. Also, both procedures displayed statistically biased estimation of the zero DIF for certain items, though SIBTEST displayed much less than MH. When no latent distributional differences were present, both procedures performed satisfactorily under all conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号