期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Influence of Nursery School Attendance upon the Behavior and Personality of the Preschool Child

Berta Weiss Hattwick 《Journal of Experimental Education》2013,81(2):180-190

The power of analysis of covariance (ANCOVA) and 2 types of randomized block designs were compared as a function of the correlation between the concomitant variable and the outcome measure, the number of groups, the number of participants, and nominal power. ANCOVA had a small but consistent advantage over a randomized block design with 1 participant in each Block × Treatment combination (RB1). At correlations of .3 or greater, ANCOVA was superior to a randomized block design with n participants per Block × Treatment combination (RBn), with increasing differences as the correlation increased. RBn was superior to the other 2 designs only when the correlation was .2 or less. At those levels, however, the randomized group analysis of variance ignoring the concomitant variable was equally powerful. The findings held regardless of sample size, number of groups, or nominal power. 相似文献

2.

The Effects of Type I Error Rate and Power of the ANCOVA F Test and Selected Alternatives Under Nonnormality and Variance Heterogeneity

David C. Rheinheimer Douglas A. Penfield 《Journal of Experimental Education》2013,81(4):373-391

The authors sought to identify through Monte Carlo simulations those conditions for which analysis of covariance (ANCOVA) does not maintain adequate Type I error rates and power. The conditions that were manipulated included assumptions of normality and variance homogeneity, sample size, number of treatment groups, and strength of the covariate-dependent variable relationship. Alternative tests studied were Quade's procedure, Puri and Sen's solution, Burnett and Barr's rank difference scores, Conover and Iman's rank transformation test, Hettmansperger's procedure, and the Puri-Sen-Harwell-Serlin test. For balanced designs, the ANCOVA F test was robust and was often the most powerful test through all sample-size designs and distributional configurations. With unbalanced designs, with variance heterogeneity, and when the largest treatment-group variance was matched with the largest group sample size, the nonparametric alternatives generally outperformed the ANCOVA test. When sample size and variance ratio were inversely coupled, all tests became very liberal; no test maintained adequate control over Type I error. 相似文献

3.

More Powerful Tests of Simple Interaction Contrasts in the Two-Way Factorial Design

Gregory R. Hancock 《Journal of Experimental Education》2017,85(1):24-35

For the two-way factorial design in analysis of variance, the current article explicates and compares three methods for controlling the Type I error rate for all possible simple interaction contrasts following a statistically significant interaction, including a proposed modification to the Bonferroni procedure that increases the power of statistical tests for deconstructing interaction effects when they are of primary substantive interest. Results indicate the general superiority of the modified Bonferroni procedure over Scheffé and Roy-type procedures, where the Bonferroni and Scheffé procedures have been modified to accommodate the logical implications of a false omnibus interaction null hypothesis. An applied example is provided and considerations for applied researchers are offered. 相似文献

4.

Comparing Models of Change to Estimate the Mediated Effect in the Pretest–Posttest Control Group Design

Matthew J. Valente David P. MacKinnon 《Structural equation modeling》2017,24(3):428-450

Models to assess mediation in the pretest–posttest control group design are understudied in the behavioral sciences even though it is the design of choice for evaluating experimental manipulations. The article provides analytical comparisons of the four most commonly used models to estimate the mediated effect in this design: analysis of covariance (ANCOVA), difference score, residualized change score, and cross-sectional model. Each of these models is fitted using a latent change score specification and a simulation study assessed bias, Type I error, power, and confidence interval coverage of the four models. All but the ANCOVA model make stringent assumptions about the stability and cross-lagged relations of the mediator and outcome that might not be plausible in real-world applications. When these assumptions do not hold, Type I error and statistical power results suggest that only the ANCOVA model has good performance. The four models are applied to an empirical example. 相似文献

5.

Effect of Sample Size Ratio and Model Misfit When Using the Difficulty Parameter Differences Procedure to Detect DIF

Ángela I. Berrío Juana Gómez-Benito 《Journal of Experimental Education》2019,87(3):367-383

This study examined the effect of sample size ratio and model misfit on the Type I error rates and power of the Difficulty Parameter Differences procedure using Winsteps. A unidimensional 30-item test with responses from 130,000 examinees was simulated and four independent variables were manipulated: sample size ratio (20/100/250/500/1000); model fit/misfit (1 PL and 3PLc =. 15 models); impact (no difference/mean differences/variance differences/mean and variance differences); and percentage of items with uniform and nonuniform DIF (0%/10%/20%). In general, the results indicate the importance of ensuring model fit to achieve greater control of Type I error and adequate statistical power. The manipulated variables produced inflated Type I error rates, which were well controlled when a measure of DIF magnitude was applied. Sample size ratio also had an effect on the power of the procedure. The paper discusses the practical implications of these results. 相似文献

6.

Comparison of Answer Copying Indices with Real Data

James A. Wollack 《Journal of Educational Measurement》2003,40(3):189-205

This study investigated the Type I error rate and power of four copying indices, K-index (Holland, 1996), Scrutiny! (Assessment Systems Corporation, 1993), g₂ (Frary, Tideman, & Watts, 1977), and ω (Wollack, 1997) using real test data from 20,000 examinees over a 2-year period. The data were divided into three different test lengths (20, 40, and 80 items) and nine different sample sizes (ranging from 50 to 20,000). Four different amounts of answer copying were simulated (10%, 20%, 30%, and 40% of the items) within each condition. The ω index demonstrated the best Type I error control and power in all conditions and at all α levels. Scrutiny! and the K-index were uniformly conservative, and both had poor power to detect true copiers at the small α levels typically used in answer copying detection, whereas g₂ was generally too liberal, particularly at small α levels. Some comments on the proper uses of copying indices are provided. 相似文献

7.

The Impact of Noninvariant Intercepts in Latent Means Models

Tiffany A. Whittaker 《Structural equation modeling》2013,20(1):108-130

Latent means methods such as multiple-indicator multiple-cause (MIMIC) and structured means modeling (SMM) allow researchers to determine whether or not a significant difference exists between groups' factor means. Strong invariance is typically recommended when interpreting latent mean differences. The extent of the impact of noninvariant intercepts on conclusions made when implementing both MIMIC and SMM methods was the main purpose of this study. The impact of intercept noninvariance on Type I error rates, power, and two model fit indices when using MIMIC and SMM approaches under various conditions were examined. Type I error and power were adversely affected by intercept noninvariance. Although the fit indices did not detect small misspecifications in the form of noninvariant intercepts, one did perform more optimally. 相似文献

8.

Detecting Individual Differences in Change: Methods and Comparisons

Zijun Ke 《Structural equation modeling》2013,20(3):382-400

This study examined and compared various statistical methods for detecting individual differences in change. Considering 3 issues including test forms (specific vs. generalized), estimation procedures (constrained vs. unconstrained), and nonnormality, we evaluated 4 variance tests including the specific Wald variance test, the generalized Wald variance test, the specific likelihood ratio (LR) variance test, and the generalized LR variance test under both constrained and unconstrained estimation for both normal and nonnormal data. For the constrained estimation procedure, both the mixture distribution approach and the alpha correction approach were evaluated for their performance in dealing with the boundary problem. To deal with the nonnormality issue, we used the sandwich standard error (SE) estimator for the Wald tests and the Satorra–Bentler scaling correction for the LR tests. Simulation results revealed that testing a variance parameter and the associated covariances (generalized) had higher power than testing the variance solely (specific), unless the true covariances were zero. In addition, the variance tests under constrained estimation outperformed those under unconstrained estimation in terms of higher empirical power and better control of Type I error rates. Among all the studied tests, for both normal and nonnormal data, the robust generalized LR and Wald variance tests with the constrained estimation procedure were generally more powerful and had better Type I error rates for testing variance components than the other tests. Results from the comparisons between specific and generalized variance tests and between constrained and unconstrained estimation were discussed. 相似文献

9.

Statistical Properties of the K-Index for Detecting Answer Copying

Leonardo S. Sotaridona Rob R. Meijer 《Journal of Educational Measurement》2002,39(2):115-132

We investigated the statistical properties of the K-index (Holland, 1996) that can be used to detect copying behavior on a test. A simulation study was conducted to investigate the applicability of the K-index for small, medium, and large datasets. Furthermore, the Type I error rate and the detection rate of this index were compared with the copying index, ω (Wollack, 1997). Several approximations were used to calculate the K-index. Results showed that all approximations were able to hold the Type I error rates below the nominal level. Results further showed that using ω resulted in higher detection rates than the K-indices for small and medium sample sizes (100 and 500 simulees). 相似文献

10.

Simulation Studies of the Effects of Small Sample Size and Studied Item Parameters on SIBTEST and Mantel-Haenszel Type I Error Performance

Louis A. Roussos William F. Stout 《Journal of Educational Measurement》1996,33(2):215-230

Two simulation studies investigated Type I error performance of two statistical procedures for detecting differential item functioning (DIF): SIBTEST and Mantel-Haenszel (MH). Because MH and SIBTEST are based on asymptotic distributions requiring "large" numbers of examinees, the first study examined Type 1 error for small sample sizes. No significant Type I error inflation occurred for either procedure. Because MH has the potential for Type I error inflation for non-Rasch models, the second study used a markedly non-Rasch test and systematically varied the shape and location of the studied item. When differences in distribution across examinee group of the measured ability were present, both procedures displayed inflated Type 1 error for certain items; MH displayed the greater inflation. Also, both procedures displayed statistically biased estimation of the zero DIF for certain items, though SIBTEST displayed much less than MH. When no latent distributional differences were present, both procedures performed satisfactorily under all conditions. 相似文献

11.

Using Johnson's Transformation With Approximate Test Statistics for the Simple Regression Slope Homogeneity

Wei-Ming Luh Jiin-Huarng Guo 《Journal of Experimental Education》2013,81(1):69-81

The authors used Johnson's transformation with approximate test statistics to test the homogeneity of simple linear regression slopes when both x_ij and x_ij may have nonnormal distributions and there is Type I heteroscedasticity, Type II heteroscedasticity, or complete heteroscedasticity. The test statistic t was first transformed by Johnson's method for each group to correct the nonnormality and to correct the heteroscedasticity; also an approximate test, such as the Welch test or the DeShon-Alexander test, was applied to test the homogeneity of the regression slopes. Computer simulations showed that the proposed technique can control Type I error rate under various circumstances. Finally, the authors provide an example to demonstrate the calculation. 相似文献

12.

A Fit Index to Assess Model Fit and Detect Omitted Terms in Nonlinear SEM

Carla Gerhard Andreas G. Klein Karin Schermelleh-Engel 《Structural equation modeling》2017,24(3):414-427

相似文献

13.

Multiplicity Control in Structural Equation Modeling

Robert A. Cribbie 《Structural equation modeling》2013,20(1):98-112

Abstract

Researchers conducting structural equation modeling analyses rarely, if ever, control for the inflated probability of Type I errors when evaluating the statistical significance of multiple parameters in a model. In this study, the Type I error control, power and true model rates of famsilywise and false discovery rate controlling procedures were compared with rates when no multiplicity control was imposed. The results indicate that Type I error rates become severely inflated with no multiplicity control, but also that familywise error controlling procedures were extremely conservative and had very little power for detecting true relations. False discovery rate controlling procedures provided a compromise between no multiplicity control and strict familywise error control and with large sample sizes provided a high probability of making correct inferences regarding all the parameters in the model. 相似文献

14.

Mediated Effects with the Parallel Process Latent Growth Model: An Evaluation of Methods for Testing Mediation in the Presence of Nonnormal Data

Namwook Koo James Algina 《Structural equation modeling》2016,23(1):32-44

This Monte Carlo simulation study investigated the impact of nonnormality on estimating and testing mediated effects with the parallel process latent growth model and 3 popular methods for testing the mediated effect (i.e., Sobel’s test, the asymmetric confidence limits, and the bias-corrected bootstrap). It was found that nonnormality had little effect on the estimates of the mediated effect, standard errors, empirical Type I error, and power rates in most conditions. In terms of empirical Type I error and power rates, the bias-corrected bootstrap performed best. Sobel’s test produced very conservative Type I error rates when the estimated mediated effect and standard error had a relationship, but when the relationship was weak or did not exist, the Type I error was closer to the nominal .05 value. 相似文献

15.

Resampling and Distribution of the Product Methods for Testing Indirect Effects in Complex Models

Jason Williams David P. MacKinnon 《Structural equation modeling》2013,20(1):23-51

Recent advances in testing mediation have found that certain resampling methods and tests based on the mathematical distribution of 2 normal random variables substantially outperform the traditional z test. However, these studies have primarily focused only on models with a single mediator and 2 component paths. To address this limitation, a simulation was conducted to evaluate these alternative methods in a more complex path model with multiple mediators and indirect paths with 2 and 3 paths. Methods for testing contrasts of 2 effects were evaluated also. The simulation included 1 exogenous independent variable, 3 mediators and 2 outcomes and varied sample size, number of paths in the mediated effects, test used to evaluate effects, effect sizes for each path, and the value of the contrast. Confidence intervals were used to evaluate the power and Type I error rate of each method, and were examined for coverage and bias. The bias-corrected bootstrap had the least biased confidence intervals, greatest power to detect nonzero effects and contrasts, and the most accurate overall Type I error. All tests had less power to detect 3-path effects and more inaccurate Type I error compared to 2-path effects. Confidence intervals were biased for mediated effects, as found in previous studies. Results for contrasts did not vary greatly by test, although resampling approaches had somewhat greater power and might be preferable because of ease of use and flexibility. 相似文献

16.

The Place of the Fable in the Character Training of Children

Sadie Goldsmith 《Journal of Experimental Education》2013,81(4):343-345

Type I error rate and power for the t test, Wilcoxon-Mann-Whitney (U) test, van der Waerden Normal Scores (NS) test, and Welch-Aspin-Satterthwaite (W) test were compared for two independent random samples drawn from nonnormal distributions. Data with varying degrees of skewness (S) and kurtosis (K) were generated using Fleishman's (1978) power function. Five sample size combinations were used with both equal and unequal variances. For nonnormal data with equal variances, the power of the U test exceeded the power of the t test regardless of sample size. When the sample sizes were equal but the variances were unequal, the t test proved to be the most powerful test. When variances and sample sizes were unequal, the W test became the test of choice because it was the only test that maintained its nominal Type I error rate. 相似文献

17.

Improving Person-Fit Assessment by Correcting the Ability Estimate and Its Reference Distribution 总被引：1，自引：0，他引：1

Jimmy de la Torre Weiling Deng 《Journal of Educational Measurement》2008,45(2):159-177

The standardized log-likelihood of a response vector (l_z) is a popular IRT-based person-fit test statistic for identifying model-misfitting response patterns. Traditional use of l_z is overly conservative in detecting aberrance due to its incorrect assumption regarding its theoretical null distribution. This study proposes a method for improving the accuracy of person-fit analysis using l_z which takes into account test unreliability when estimating the ability and constructs the distribution for each l_z through resampling methods. The Type I error and power (or detection rate) of the proposed method were examined at different test lengths, ability levels, and nominal α levels along with other methods, and power to detect three types of aberrance—cheating, lack of motivation, and speeding—was considered. Results indicate that the proposed method is a viable and promising approach. It has Type I error rates close to the nominal value for most ability levels and reasonably good power. 相似文献

18.

Validation against a Fallible Criterion

Edward E. Cureton 《Journal of Experimental Education》2013,81(3):258-263

This paper presents the results of a simulation study to compare the performance of the Mann-Whitney U test, Student?s t test, and the alternate (separate variance) t test for two mutually independent random samples from normal distributions, with both one-tailed and two-tailed alternatives. The estimated probability of a Type I error was controlled (in the sense of being reasonably close to the attainable level) by all three tests when the variances were equal, regardless of the sample sizes. However, it was controlled only by the alternate t test for unequal variances with unequal sample sizes. With equal sample sizes, the probability was controlled by all three tests regardless of the variances. When it was controlled, we also compared the power of these tests and found very little difference. This means that very little power will be lost if the Mann-Whitney U test is used instead of tests that require the assumption of normal distributions. 相似文献

19.

A Comparison of Tests for Equality of Two or More Independent Alpha Coefficients

Seonghoon Kim Leonard S. Feldt 《Journal of Educational Measurement》2008,45(2):179-193

This article extends the Bonett (2003a) approach to testing the equality of alpha coefficients from two independent samples to the case of m ≥ 2 independent samples. The extended Fisher-Bonett test and its competitor, the Hakstian-Whalen (1976) test, are illustrated with numerical examples of both hypothesis testing and power calculation. Computer simulations are used to compare the performance of the two tests and the Feldt (1969) test (for m = 2) in terms of power and Type I error control. It is shown that the Fisher-Bonett test is just as effective as its competitors in controlling Type I error, is comparable to them in power, and is equally robust against heterogeneity of error variance. 相似文献

20.

Evaluating the Wald Test for Item‐Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

Jimmy de la Torre Young‐Sun Lee 《Journal of Educational Measurement》2013,50(4):355-373

This article used the Wald test to evaluate the item‐level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G‐DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A‐CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings. 相似文献