首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
《Exceptionality》2013,21(4):209-221
This article first outlines the underlying logic of null hypothesis testing and the philosophical and practical problems associated with using it to evaluate special education research. The article then presents 3 alternative metrics-a binomial effect size display, a relative risk ratio, and an odds ratio--that can better aid researchers and practitioners in identifying important treatment effects. Each metric is illustrated using data from recently evaluated special education interventions. The article justifies interpreting a research result as significant when the practical importance of the sample differences is evident and when chance fluctuations due to sampling can be shown to be an unlikely explanation for the differences.  相似文献   

2.
Kelley and Lai (2011) recently proposed the use of accuracy in parameter estimation (AIPE) for sample size planning in structural equation modeling. The sample size that reaches the desired width for the confidence interval of root mean square error of approximation (RMSEA) is suggested. This study proposes a graphical extension with the AIPE approach, abbreviated as GAIPE, on RMSEA to facilitate sample size planning in structural equation modeling. GAIPE simultaneously displays the expected width of a confidence interval of RMSEA, the necessary sample size to reach the desired width, and the RMSEA values covered in the confidence interval. Power analysis for hypothesis tests related to RMSEA can also be integrated into the GAIPE framework to allow for a concurrent consideration of accuracy in estimation and statistical power to plan sample sizes. A package written in R has been developed to implement GAIPE. Examples and instructions for using the GAIPE package are presented to help readers make use of this flexible framework. With the capacity of incorporating information on accuracy in RMSEA estimation, values of RMSEA, and power for hypothesis testing on RMSEA in a single graphical representation, the GAIPE extension offers an informative and practical approach for sample size planning in structural equation modeling.  相似文献   

3.
4.
DIMTEST is a widely used and studied method for testing the hypothesis of test unidimensionality as represented by local item independence. However, DIMTEST does not report the amount of multidimensionality that exists in data when rejecting its null. To provide more information regarding the degree to which data depart from unidimensionality, a DIMTEST-based Effect Size Measure (DESM) was formulated. In addition to detailing the development of the DESM estimate, the current study describes the theoretical formulation of a DESM parameter. To evaluate the efficacy of the DESM estimator according to test length, sample size, and correlations between dimensions, Monte Carlo simulations were conducted. The results of the simulation study indicated that the DESM estimator converged to its parameter as test length increased, and, as desired, its expected value did not increase with sample size (unlike the DIMTEST statistic in the case of multidimensionality). Also as desired, the standard error of DESM decreased as sample size increased.  相似文献   

5.
Conventional null hypothesis testing (NHT) is a very important tool if the ultimate goal is to find a difference or to reject a model. However, the purpose of structural equation modeling (SEM) is to identify a model and use it to account for the relationship among substantive variables. With the setup of NHT, a nonsignificant test statistic does not necessarily imply that the model is correctly specified or the size of misspecification is properly controlled. To overcome this problem, this article proposes to replace NHT by equivalence testing, the goal of which is to endorse a model under a null hypothesis rather than to reject it. Differences and similarities between equivalence testing and NHT are discussed, and new “T-size” terminology is introduced to convey the goodness of the current model under equivalence testing. Adjusted cutoff values of root mean square error of approximation (RMSEA) and comparative fit index (CFI) corresponding to those conventionally used in the literature are obtained to facilitate the understanding of T-size RMSEA and CFI. The single most notable property of equivalence testing is that it allows a researcher to confidently claim that the size of misspecification in the current model is below the T-size RMSEA or CFI, which gives SEM a desirable property to be a scientific methodology. R code for conducting equivalence testing is provided in an appendix.  相似文献   

6.
The primary purpose of the present study was to test the hypothesis that two general developmentally based levels of hypothesis‐testing skills exist. The first hypothesized level presumably involves skills associated with testing hypotheses about observable causal agents; the second presumably involves skills associated with testing hypotheses involving unobservable entities. To test this hypothesis, a hypothesis‐testing skills test was developed and administered to a large sample of college students both at the start and at the end of a biology course in which several hypotheses at each level were generated and tested. The predicted positive relationship between level of hypothesis‐testing skill and performance on a transfer problem involving the test of a hypothesis involving unobservable entities was found. The predicted positive relationship between level of hypothesis‐testing skill and course performance was also found. Both theoretical and practical implications of the findings are discussed. © 2000 John Wiley & Sons, Inc. J Res Sci Teach 37: 81–101, 2000  相似文献   

7.
When using the popular structural equation modeling (SEM) methodology, the issues of sample size, method of parameter estimation, assessment of model fit, and capitalization on chance are of great importance in the process of evaluating the results of an empirical study. We focus first on implications of the large‐sample theory underlying applications of the methodology. The utility for applied contexts of the asymptotically distribution‐free parameter estimation and model testing method is discussed next. We then argue for wider use of a recently developed, non conventional model‐fit assessment strategy in SEM. We conclude by discussing the issue of capitalization on chance, primarily in situations in which exploratory and confirmatory analyses are conducted on the same data set.  相似文献   

8.
Abstract

In planning a research study, investigators are frequently uncertain regarding the minimal number of subjects needed to adequately test a hypothesis of interest. The present paper discusses the sample size problem and four factors which affect its solution: significance level, statistical power, analysis procedure, and effect size. The interrelationship between these factors is discussed and demonstrated by calculating minimal sample size requirements for a variety of research conditions.  相似文献   

9.
Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical procedures that counteract this problem by adjusting p values for effect estimates upward. Although MTPs are increasingly used in impact evaluations in education and other areas, an important consequence of their use is a change in statistical power that can be substantial. Unfortunately, researchers frequently ignore the power implications of MTPs when designing studies. Consequently, in some cases, sample sizes may be too small, and studies may be underpowered to detect effects as small as a desired size. In other cases, sample sizes may be larger than needed, or studies may be powered to detect smaller effects than anticipated. This paper presents methods for estimating statistical power for multiple definitions of statistical power and presents empirical findings on how power is affected by the use of MTPs.  相似文献   

10.
This article discusses the sample size requirements for the interaction, row, and column effects, respectively, by forming a linear contrast for a 2×2 factorial design for fixed-effects heterogeneous analysis of variance. The proposed method uses the Welch t test and its corresponding degrees of freedom to calculate the final sample size in a 2-step procedure. The simulation results show that the proposed sample size allocation ratio can minimize the sampling cost, while at the same time the designated power is achieved. The article concludes with a discussion to reiterate the importance of sample size planning, especially for testing the iteration effect.  相似文献   

11.
The latent growth curve modeling (LGCM) approach has been increasingly utilized to investigate longitudinal mediation. However, little is known about the accuracy of the estimates and statistical power when mediation is evaluated in the LGCM framework. A simulation study was conducted to address these issues under various conditions including sample size, effect size of mediated effect, number of measurement occasions, and R 2 of measured variables. In general, the results showed that relatively large samples were needed to accurately estimate the mediated effects and to have adequate statistical power, when testing mediation in the LGCM framework. Guidelines for designing studies to examine longitudinal mediation and ways to improve the accuracy of the estimates and statistical power were discussed.  相似文献   

12.
This simulation study focused on the power for detecting group differences in linear growth trajectory parameters within the framework of structural equation modeling (SEM) and compared the latent growth modeling (LGM) approach to the more traditional repeated-measures analysis of variance (ANOVA) approach. Several patterns of group differences in linear growth trajectories were considered. SEM growth modeling consistently showed higher statistical power for detecting group differences in the linear growth slope than repeated-measures ANOVA. For small group differences in the growth trajectories, large sample size (e.g., N > 500) would be required for adequate statistical power. For medium or large group differences, moderate or small sample size would be sufficient for adequate power. Some future research directions are discussed.  相似文献   

13.
A sample of preservice biology teachers (biology majors) enrolled in a teaching methods course formulated and attempted to test six hypotheses to answer a causal question about why water rose in a jar inverted over a burning candle placed in a pan of water. The students submitted a lab report in which arguments and evidence for testing each hypothesis were presented in an if/then/therefore hypothetico‐predictive form. Analysis of written arguments revealed considerable success when students were able to manipulate observable hypothesized causes. However, when the hypothesized causes were unobservable, such that they could be only indirectly tested, performance dropped, as shown by use of three types of faulty arguments: (a) arguments that had missing or confused elements, (b) arguments whose predictions did not follow from hypotheses and planned tests, and (c) arguments that failed to consider alternative hypotheses. Science is an enterprise in which unobservable theoretical entities and processes (e.g., atoms, genes, osmosis, and photosynthesis) are often used to explain observable phenomena. Consequently, if it is assumed that effective teaching requires prior understanding, then it follows that these future teachers have yet to develop adequate hypothesis‐testing skills and sufficient awareness of the nature of science to teach science in the inquiry mode advocated by reform guidelines. © 2002 Wiley Periodicals, Inc. J Res Sci Teach 39: 237–252, 2002  相似文献   

14.

The temporal discrimination hypothesis (TDH) of delayed matching-to-sample (DMTS) stresses the animal’s ability to discriminate which choice-stimulus alternative has appeared most recently as sample. Thus, the emphasis is placed on discriminative processes, temporal in nature, rather than on the traditional trace or buffer storage mechanisms of short-term memory. Some of the predictions of the TDH were tested within the context of the DMTS task. Experiment I showed that the difficulty of sample-stimulus sequences could be predicted by the TDH. Experiment II showed DMTS performance to be an increasing function of the number of sample stimuli employed, a result predicted by the TDH, but not by a traditional proactive interference interpretation. The results demonstrate the importance of temporal discriminative processes in DMTS. The possibility for a simpler theoretical approach to memory, in general, is discussed.

  相似文献   

15.
Evaluation is an inherent part of education for an increasingly diverse student population. Confidence in one’s test‐taking skills, and the associated testing environment, needs to be examined from a perspective that combines the concept of Bandurian self‐efficacy with the concept of stereotype threat reactions in a diverse student sample. Factors underlying testing reactions and performance on a cognitive ability test in four different testing conditions (high or low stereotype threat and high or low test face validity) were examined in this exploratory study. The stereotype threat manipulation seemed to lower African‐American and Hispanic participants’ test scores. However, the hypothesis that there would be an interaction with face validity was only partially supported. Participants’ highest scores resulted from low stereotype threat and high face validity, as predicted. However, the lowest scores were not in the high stereotype threat/ low face validity condition as expected. Instead, most groups tended to score lower when the test was perceived to be more face valid. Stereotype threat manipulation affected Whites as well as non‐Whites, although differently. Specifically, high stereotype threat increased Whites’ cognitive ability test scores in the low face validity condition, but decreased them in the high face validity condition. Implications for testing and classroom environment design are discussed.  相似文献   

16.
DETECT, the acronym for Dimensionality Evaluation To Enumerate Contributing Traits, is an innovative and relatively new nonparametric dimensionality assessment procedure used to identify mutually exclusive, dimensionally homogeneous clusters of items using a genetic algorithm ( Zhang & Stout, 1999 ). Because the clusters of items are mutually exclusive, this procedure is most useful when the data display approximate simple structure. In many testing situations, however, data display a complex multidimensional structure. The purpose of the current study was to evaluate DETECT item classification accuracy and consistency when the data display different degrees of complex structure using both simulated and real data. Three variables were manipulated in the simulation study: The percentage of items displaying complex structure (10%, 30%, and 50%), the correlation between dimensions (.00, .30, .60, .75, and .90), and the sample size (500, 1,000, and 1,500). The results from the simulation study reveal that DETECT can accurately and consistently cluster items according to their true underlying dimension when as many as 30% of the items display complex structure, if the correlation between dimensions is less than or equal to .75 and the sample size is at least 1,000 examinees. If 50% of the items display complex structure, then the correlation between dimensions should be less than or equal to .60 and the sample size be, at least, 1,000 examinees. When the correlation between dimensions is .90, DETECT does not work well with any complex dimensional structure or sample size. Implications for practice and directions for future research are discussed.  相似文献   

17.
Post‐formal operations as a stage of cognitive development beyond Piaget's formal operations state are discussed. It is argued that thinking abilities are of major importance for an adequate understanding of quantum‐mechanical and relativistic issues as they occur in modern science, especially physics. Some pedagogical consequences of the ‘fifth stage’ of cognitive development are discussed and proposals made about how post‐operational thinking abilities might be developed in students.  相似文献   

18.
Having discussed the dearth of testing of actual performance in real situations, a variety of arguments are raised to support the need for assessment in the Practical mode, and the development and nature of an inquiry oriented laboratory examination is described. One test problem is presented in detail including materials, instructions to examinees, instructions for administration and scoring as well as sample answers. Data regarding validity and reliability are provided together with findings pertaining to the relationship between the various skills assessed by the examination. Moderation procedures for determining the individual scores in an examination in which different students perform different test problems are suggested. The author contends that the type of examination described reflects the inquiry objectives of the BSCS philosophy and provides a valid and reliable measure of problem solving ability in a practical laboratory setting.  相似文献   

19.
This article describes an example which is useful when teaching hypothesis testing in order to highlight the interrelationships that exist among the level of significance, the sample size and the statistical power of a test. The example also allows students to see how what they learn in the classroom directly affects the content of some of the commercials that they watch on television.  相似文献   

20.
Authors who write introductory business statistics texts do not agree on when to use a t distribution and when to use a Z distribution in both the construction of confidence intervals and the use of hypothesis testing. In a survey of textbooks written in the last 15 years, we found the decision rules to be contradictory and, at times, the explanations unclear. This paper is an attempt to clarify the decision rules and to recommend that one consistent rule be chosen to minimize confusion to students, instructors, and practitioners. Using the t distribution whenever σ is unknown, regardless of sample size, seems to provide the best solution both theoretically and practically.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号