首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

Recent publications have drawn attention to the idea of utilizing prior information about the correlation structure to improve statistical power in cluster randomized experiments. Because power in cluster randomized designs is a function of many different parameters, it has been difficult for applied researchers to discern a simple rule explaining when prior correlation information will substantially improve power. This article provides bounds on the maximum possible improvement in power as a function of a single parameter, the number of clusters at the highest level of a multilevel experiment. The maximum improvement in power is less than 0.05 unless the number of clusters at the highest level is less than 20. Thus, the utility of using prior correlation information is limited to experiments with very small cluster-level sample sizes. Situations where small cluster-level sample sizes could still result in experiments with good statistical power are discussed, as is the relative utility of prior information about intracluster correlations as compared with covariate information that can explain cluster level variability in the outcome.  相似文献   

2.
Abstract

This paper and the accompanying tool are intended to complement existing supports for conducting power analysis tools by offering a tool based on the framework of Minimum Detectable Effect Sizes (MDES) formulae that can be used in determining sample size requirements and in estimating minimum detectable effect sizes for a range of individual- and group-random assignment design studies and for common quasi-experimental design studies. The paper and accompanying tool cover computation of minimum detectable effect sizes under the following study designs: individual random assignment designs, hierarchical random assignment designs (2-4 levels), block random assignment designs (2-4 levels), regression discontinuity designs (6 types), and short interrupted time-series designs. In each case, the discussion and accompanying tool consider the key factors associated with statistical power and minimum detectable effect sizes, including the level at which treatment occurs and the statistical models (e.g., fixed effect and random effect) used in the analysis. The tool also includes a module that estimates for one and two level random assignment design studies the minimum sample sizes required in order for studies to attain user-defined minimum detectable effect sizes.  相似文献   

3.
ABSTRACT

Demands for scientific knowledge of what works in educational policy and practice has driven interest in quantitative investigations of educational outcomes, and randomized controlled trials (RCTs) have proliferated under these conditions. In educational settings, even when individuals are randomized, both experimental and control students are often grouped into particular classrooms and schools and share common learning experiences. Analyses that account for these clusters are common. A less common design involves one clustered experimental arm and one unclustered experimental arm, sometimes called a partially clustered design. Analysts do not always use methods that yield valid statistical inferences for such partially clustered designs. Additionally, published methods for handling partially clustered designs may not be flexible enough to handle real-world complications, including treatment non-compliance. In this paper, we illustrate how models that accommodate partial clustering may be used in educational research. We explore the performance of these models using a series of Monte Carlo simulations informed by data taken from a large-scale RCT studying the impacts of a programme designed to decrease summer learning loss. We find that clustering and non-compliance can have substantial impacts on statistical inferences about intent-to-treat effects, and demonstrate methods that show promise for addressing these complications.  相似文献   

4.
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model. The power is related to the item response function (IRF) for the studied item, the latent trait distributions, and the sample sizes for the reference and focal groups. Simulation studies show that the theoretical values calculated from the formulas derived in the article are close to what are observed in the simulated data when the assumptions are satisfied. The robustness of the power formulas are studied with simulations when the assumptions are violated.  相似文献   

5.
ABSTRACT

Using a “naïve” specification, this paper estimates the relationship between 36 high school characteristics and 24 student outcomes controlling for students' pre-high school characteristics. The goal of this exploration is not to generate casual estimates, but rather to: (a) compare the size of the relationships to determine which inputs seem most promising and to identify which student outcomes appear most susceptible to being affected; (b) obtain likely upper-bound effect sizes that are useful information for power analyses used to establish minimum sample sizes for more robust designs capable of revealing causal impacts; and (c) illustrate how small effects over many outcomes (which are cumulatively important) can be easily missed. I find that most of the 36 inputs appear to have affected more outcomes than one would expect by chance, but that the apparent effects were generally small. Further, I find a higher frequency of large and significant apparent effects on educational achievement and attainment outcomes than labor market and other outcomes for young adults.  相似文献   

6.
The design of research studies utilizing binary multilevel models must necessarily incorporate knowledge of multiple factors, including estimation method, variance component size, or number of predictors, in addition to sample sizes. This Monte Carlo study examined the performance of random effect binary outcome multilevel models under varying methods of estimation, level-1 and level-2 sample size, outcome prevalence, variance component sizes, and number of predictors using SAS software. Mean estimates of statistical power were influenced primarily by sample sizes at both levels. In addition, confidence interval coverage and width and the likelihood of nonpositive definite random effect covariance matrices were impacted by variance component size and estimation method. The interactions of these and other factors with various model performance outcomes are explored.  相似文献   

7.
The study, using a Monte Carlo technique, was designed to investigate the effect of the differences in covariate means among the treatment groups on the significance level and the power of the F-test of the analysis of covariance. The results show that the covariate group means differences have little effect on the significance level if the covariate is highly correlated with the criterion variable. However, if the correlation is .4 or less, larger sample sizes are required. The effect on the power is more sensitive for smaller experiments. The larger the differences among covariate group means, the lower the actual power becomes compared to the approximate theoretical power.  相似文献   

8.
Researchers are often interested in whether the effects of an intervention differ conditional on individual- or group-moderator variables such as children's characteristics (e.g., gender), teacher's background (e.g., years of teaching), and school's characteristics (e.g., urbanity); that is, the researchers seek to examine for whom and under what circumstances an intervention works. Furthermore, the researchers are interested in understanding and interpreting variability in treatment effects through moderation analysis as an approach to exploring the sources of the treatment effect variability. This study develops formulas for power analyses to detect the moderator effects in designing three-level cluster randomized trials (CRTs). We develop the statistical formulas for calculating statistical power, minimum detectable effect size difference, and 95% confidence intervals for cluster or cross-level moderation, nonrandomly varying or random slopes, binary or continuous moderators, and designs with or without covariates. We demonstrate how the calculations can be used in the planning phase of three-level CRTs using the software PowerUp!-Moderator.  相似文献   

9.
Abstract

Although the research methodology literature includes empirical benchmarks for effect sizes and intraclass correlations to help researchers determine adequate sample sizes through power analysis, it does not include similar benchmarks that would assist proper planning for attrition. To help fill this void, this paper describes how researchers can incorporate student attrition in power analyses and provides empirical benchmarks for the amount of attrition one might expect when conducting a school-based study that follows students over multiple years. The paper incorporates parameters for student attrition in common minimum detectable effect size calculations, presents attrition benchmarks based on student mobility rates in nationally representative longitudinal surveys, and presents benchmarks based on a synthesis of published evaluation studies. The paper includes a demonstration of how researchers can use the attrition benchmarks to take student attrition into account in a power analysis.  相似文献   

10.
Power and stability of Type I error rates are investigated for the Box-Scheffé test of homogeneity of variance with varying subsample sizes under conditions of normality and nonnormality. The test is shown to be robust to violation of the normality assumption when sampling is from a leptokurtic population. Subsample sizes which produce maximum power are given for small, intermediate, and large sample situations. Suggestions for selecting subsample sizes which will produce maximum power for a given n are provided. A formula for estimating power in the equal n case is shown to give results agreeing with empirical results.  相似文献   

11.
Abstract

This study examines the reporting of power analyses in the group randomized trials funded by the Institute of Education Sciences from 2002 to 2006. A detailed power analysis provides critical information that allows reviewers to (a) replicate the power analysis and (b) assess whether the parameters used in the power analysis are reasonable. Without a detailed power analysis, reviewers may have difficultly evaluating the accuracy of the power analysis and underpowered studies may inadvertently pass through the review process with a recommendation for funding. This study reveals that sample sizes are reported with high consistency; however, other important design parameters, including intraclass correlations, covariate-outcome correlations, and the percentage of variance explained by blocking are not reported with regularity. An analysis of reporting trends over time reveals that the reporting of intraclass correlations and covariate-outcome correlations dramatically increased over time. The reporting of blocking information was still extremely limited, even in the more recent studies.  相似文献   

12.

This article examines a business-education mentoring programme supported by Roots & Wings, an initiative run by Business in the Community (BitC), in order to identify key factors in its functioning. The programme supports employers in establishing voluntary mentoring schemes in which their employees mentor local school children; we use, as an exemplar, a national project run by BT's Personal Communication Division. The article develops an analysis relevant to the operation of such a business—education mentoring programme by adapting a theoretical framework (Forrest, 1992) for the analysis of strategic alliances between companies in the business community. Using the same stages (pre-alliance matching and negotiation/alliance agreement/alliance implementation) and adapting, or devising, relevant categories the article develops a multilevel (strategic/organisational/individual) analysis of the programme.  相似文献   

13.

The APA Task Force on Statistical Inference recently recommended reporting effect sizes alongside results of statistical significance tests. The purpose of this article is to investigate effect size usage in gifted education research and to follow up on a similar investigation published by Plucker (1997). A content analysis of effect size reporting was conducted of articles published in the Journal for the Education of the Gifted, Roeper Review, and Gifted Child Quarterly from 1995–2000. Results of the present study were similar to the findings of Plucker (1997): No statistical difference in reporting was found across journals or across years, and a moderate difference was found between effect size reporting in univariate versus multivariate statistics. The benefits to gifted education research of understanding the relationship among sample size, effect size, and statistical power are discussed.  相似文献   

14.

While there is growing interest in studying principals' perceptions of their work lives in terms of dilemmas, relatively few studies have gone beyond this to investigate how leaders manage and cope with such 'intractable' situations and the consequential effects and outcomes. Accordingly, this article provides an in-depth qualitative case study of the dilemmas faced by a principal who is involved in the restructuring of his school. It then analyses the ways in which he manages and copes with these intractable situations, and the effects and outcomes that result. The article begins by outlining a framework used in the analysis. It addresses some considerations of method, before describing relevant school and system contexts. Finally, the in-depth case study is presented using the structure associated with the framework described earlier. Among the key findings are that dilemmas present opportunities as well as challenges for visionary, proactive and creative school leaders.  相似文献   

15.
A calculation of the probability of rejecting H0 when it should be rejected (power) was completed on each of the 66 applicable articles in Volumes 6 and 7 (1969, 1970) of the Journal of Research in Science Teaching. These power calculations utilized the effect size definitions and tables developed by Cohen (1969). The mean power of each article to detect small, medium, and large effect sizes was determined from its major statistical tests. These mean powers were then compiled and analyzed. The powers calculated for the different effect sizes were disturbingly low (small, 0.22; medium, 0.71; large, 0.87) but not generally as low as Cohen (1962) found in an analysis of another behavioral journal. Recommendations for improving confidence in research in science teaching is provided and centers on significant increases in sample sizes and an understanding of power and its relation to a, effect size and sample size.  相似文献   

16.
Abstract

This article provides practical guidance for researchers who are designing studies that randomize groups to measure the impacts of educational interventions. The article (a) provides new empirical information about the values of parameters that influence the precision of impact estimates (intraclass correlations and R 2 values) and includes outcomes other than standardized test scores and data with a three-level structure rather than a two-level structure, and (b) discusses the error (both generalizability and estimation error) that exists in estimates of key design parameters and the implications this error has for design decisions. Data for the paper come primarily from two studies: the Chicago Literacy Initiative: Making Better Early Readers Study (CLIMBERS) and the School Breakfast Pilot Project (SBPP). The analysis sample from CLIMBERS comprised 430 four-year-old children from 47 preschool classrooms in 23 Chicago public schools. The analysis sample from the SBPP study comprised 1,151 third graders from 233 classrooms in 111 schools from 6 school districts. Student achievement data from the Reading First Impact Study is also used to supplement the discussion.  相似文献   

17.
ABSTRACT

Experimental evaluations that involve the educational system usually involve a hierarchical structure (students are nested within classrooms that are nested within schools, etc.). Concerns about contamination, where research subjects receive certain features of an intervention intended for subjects in a different experimental group, have often led researchers to randomize units at a higher level of the educational hierarchy. Existing work on two-level designs suggests that situations where contamination should lead to randomization at a higher level are likely to be rare. This article extends these results to the case of three-level designs. In order to understand the implications of mathematical results, existing information about the size of intracluster correlation coefficients (ICCs) in educational studies with three levels and about the extent of treatment effect heterogeneity across schools is discussed. Better empirical estimates of ICCs, treatment effect heterogeneity, and plausible contamination values are necessary to make full use of the results in this article. However, it seems likely that situations where contamination should lead to randomization at a higher level in three-level designs are rare.  相似文献   

18.
BackgroundAlthough most children experience at least one adversity, it is the experience of multiple adversities that produces a context of disadvantage that increases the risk of various negative outcomes in adulthood. Previous measures of cumulative childhood adversity consider a limited number of adversities, overlook potential differences across experiences of adversity, and fail to measure the effects of multiple co-occurring childhood adversities. These limitations have led to inconsistent and incomplete conclusions regarding the impact of multiple adverse childhood experiences on adult mental health.ObjectiveThis study assesses how the operationalization and modeling of exposure to cumulative childhood adversity (CCA) influences estimates of the association between CCA and adult psychological distress and develops an improved measure of CCA.MethodsWe use data from the Panel Study of Income Dynamics, a nationally representative sample of households in the United States, and its supplement, the Childhood Retrospective Circumstances Study (N = 4219). We compare four measures of CCA that consider various distinct aspects of adverse experiences (additive, severity, type, and patterns of experience using latent class analysis).ResultsAll measures of CCA were associated with increases in adult psychological distress, but effects depend on the measurement of CCA. Results suggest the sum score overestimates the overall impact of CCA. Latent class analysis captures the co-occurrence of adversities across severity and type, providing an improved measure of CCA.ConclusionsThe heterogeneity across adversities impacts estimates of adult psychological distress. Measuring CCA as patterns of co-occurring adverse experiences is a promising approach.  相似文献   

19.
This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under the assumption of a single-level latent class model. The outcomes of interest are measures of bias in the Bayesian Information Criterion (BIC) and the entropy R 2 statistic relative to accounting for the multilevel structure of the data. The results indicate that the size of the intraclass correlation as well as between- and within-cluster sizes are the most prominent factors in determining the amount of bias in these outcome measures, with increasing intraclass correlations combined with small between-cluster sizes resulting in increased bias. Bias is particularly noticeable in the BIC. In addition, there is evidence that class separation interacts with the size of the intraclass correlations and cluster sizes in producing bias in these measures.  相似文献   

20.
Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical procedures that counteract this problem by adjusting p values for effect estimates upward. Although MTPs are increasingly used in impact evaluations in education and other areas, an important consequence of their use is a change in statistical power that can be substantial. Unfortunately, researchers frequently ignore the power implications of MTPs when designing studies. Consequently, in some cases, sample sizes may be too small, and studies may be underpowered to detect effects as small as a desired size. In other cases, sample sizes may be larger than needed, or studies may be powered to detect smaller effects than anticipated. This paper presents methods for estimating statistical power for multiple definitions of statistical power and presents empirical findings on how power is affected by the use of MTPs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号