首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Researchers interested in exploring substantive group differences are increasingly attending to bundles of items (or testlets): the aim is to understand how gender differences, for instance, are explained by differential performances on different types or bundles of items, hence differential bundle functioning (DBF). Some previous work has modelled hierarchies in data in this context or considered item responses within persons, but here we model the bundles themselves as explanatory variables at the item level potentially explaining significant intra-class correlation due to gender differences in item difficulty, and thus explaining variation at the second item level. In this study, we analyse DBF using single- and two-level models (the latter modelling random item effects, which models responses at Level 1 and items at Level 2) in a high-stakes National Mathematics test. The models show comparable regression coefficients but the statistical significances of the two-level models are smaller due to the larger values of the estimated standard errors. We discuss the contrasting relevance of this effect for test developers and gender researchers.  相似文献   

2.
There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions that affect investigations of differential item functioning (DIF) such as choice of method, sample size, effect size criteria, conditioning variable, purification, DIF amplification, DIF cancellation, and research designs for evaluating DIF. Our review highlights the necessity of matching the DIF procedure to the nature of the data analysed, the need to include effect size criteria, the need to consider the direction and balance of items flagged for DIF, and the need to use replication to reduce Type I errors whenever possible. Directions for future research and practice in using DIF to enhance the validity of test scores are provided.  相似文献   

3.
This article proposes two multidimensional IRT model-based methods of selecting item bundles (clusters of not necessarily adjacent items chosen according to some organizational principle) suspected of displaying DIF amplification. The approach embodied in these two methods is inspired by Shealy and Stout's (1993a, 1993b) multidimensional model for DIF. Each bundle selected by these methods constitutes a DIF amplification hypothesis. When SIBTEST (Shealy & Stout, 1993b) confirms DIF amplification in selected bundles, differential bundle functioning (DBF) is said to occur. Three real data examples illustrate the two methods for suspect bundle selection. The effectiveness of the methods is argued on statistical grounds. A distinction between benign and adverse DIF is made. The decision whether flagged DIF items or DBF bundles display benign or adverse DIF/DBF must depend in part on nonstatistical construct validity arguments. Conducting DBF analyses using these methods should help in the identification of the causes of DIF/DBF.  相似文献   

4.
Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item‐level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has recently been proposed, whereby measurement invariance is examined within each step underlying the polytomous response variable. The examination of DSF can provide valuable information concerning the nature of the DIF effect (i.e., is the DIF an item‐level effect or an effect isolated to specific score levels), the location of the DIF effect (i.e., precisely which score levels are manifesting the DIF effect), and the potential causes of a DIF effect (i.e., what properties of the item stem or task are potentially biasing). This article presents a didactic overview of the DSF framework and provides specific guidance and recommendations on how DSF can be used to enhance the examination of DIF in polytomous items. An example with real testing data is presented to illustrate the comprehensive information provided by a DSF analysis.  相似文献   

5.
This study investigated differential item functioning (DIF), differential bundle functioning (DBF), and differential test functioning (DTF) across gender of the reading comprehension section of the Graduate School Entrance English Exam in China. The datasets included 10,000 test-takers’ item-level responses to 6 five-item testlets. Both DIF and DBF were examined by using poly-simultaneous item bias test and item-response-theory-likelihood-ratio test, and DTF was investigated with multi-group confirmatory factor analyses (MG-CFA). The results indicated that although none of the 30 items exhibited statistically and practically significant DIF across gender at the item level, 2 testlets were consistently identified as having significant DBF at the testlet level by the two procedures. Nonetheless, DBF does not manifest itself at the overall test score level to produce DTF based on MG-CFA. This suggests that the relationship between item-level DIF and test-level DTF is a complicated issue with the mediating effect of testlets in testlet-based language assessment.  相似文献   

6.
This article reviews the arguments for reporting effect size estimates as part of the statistical results in empirical studies. Following this review, formulas are presented for the calculation of major mean‐difference and association‐based effect size measures for t tests, one‐way ANOVA, zero order correlation, simple regression, multiple regression, and chi‐square. The emphasis is on the presentation formulas that make the calculation of effect size measures as easy as possible. In most cases, the formula components are readily available and easily recognizable on the output from most major statistical software. Examples of effect size reporting with guidelines for design and analytic variations are provided. © 2006 Wiley Periodicals, Inc. Psychol Schs 43: 653–672, 2006.  相似文献   

7.
Recent research has proposed a criterion to evaluate the reportability of subscores. This criterion is a value‐added ratio (VAR), where values greater than 1 suggest that the true subscore is better approximated by the observed subscore than by the total score. This research extends the existing literature by quantifying statistical significance and effect size for using VAR to provide practical guidelines for subscore interpretation and reporting. Findings indicate that subscores with VAR ≥ 1.1 are a minimum requirement for a meaningful contribution to a user's score interpretation; subscores with .9 < VAR < 1.1 are redundant with the total score and subscores with VAR ≤ .9 would be misleading to report. Additionally, we discuss what to do when subscores do not add value, yet must be reported, as well as when VAR ≥ 1.1 may be undesirable.  相似文献   

8.
This paper demonstrates and discusses the use of think aloud protocols (TAPs) as an approach for examining and confirming sources of differential item functioning (DIF). The TAPs are used to investigate to what extent surface characteristics of the items that are identified by expert reviews as sources of DIF are supported by empirical evidence from examinee thinking processes in the English and French versions of a Canadian national assessment. In this research, the TAPs confirmed sources of DIF identified by expert reviews for 10 out of 20 DIF items. The moderate agreement between TAPs and expert reviews indicates that evidence from expert reviews cannot be considered sufficient in deciding whether DIF items are biased and such judgments need to include evidence from examinee thinking processes.  相似文献   

9.
Abstract

The present article attempts to reinterpret the findings of most recent studies investigating effect of using games for teaching purposes. A methodological approach combining a meta-analysis of quantitative data with qualitative ones was adopted in order to present the broadest picture of the current research on educational use of games. To this end, we conducted a meta-analysis of 180 effect size comparisons out of 154 empirical studies on the effect of both digital and non-digital games on academic achievement conducted during the period from 2004 to 2019 in order to determine the overall effect size of using games for teaching various subjects. The overall sample size of the studies included a total number of 12800 participants. Some moderator analyses were also carried out to determine the exact efficiency of educational games in terms of student levels, durations of implementation of game activities, school subjects in which games were used, class sizes, kinds of games and achievement tests used. The findings suggest that educational games have a positive effect on academic achievement and this effect is at a medium level (g?=?0.695). The highest effect sizes were observed in foreign language courses (g?=?0.87), small (less than 50) class sizes (g?=?0.87), and in non-digital games (g?=?0.90). Moreover, we conducted a meta-thematic analysis based on document analysis of qualitative studies in order to further consolidate the findings of the meta-analysis. The meta-thematic dimension of our study reveals cognitive contributions as well as drawbacks of game-based teaching, and provides suggestions for conducting educational games in a better way.  相似文献   

10.
Feeding imprinting, considered a survival‐enabling process, is not well understood. Infants born very preterm, who first feed passively, are an effective model for studying feeding imprinting. Retrospective analysis of neonatal intensive care unit (NICU) records of 255 infants (Mgestational age = 29.98 ± 1.64) enabled exploring the notion that direct breastfeeding (DBF) during NICU stay leads to consumption of more mother's milk and earlier NICU discharge. Results showed that DBF before the first bottle feeding is related to shorter transition into oral feeding, a younger age of full oral feeding accomplishment and earlier discharge. Furthermore, the number of DBF meals before first bottle feeding predicts more maternal milk consumption and improved NICU outcomes. Improved performance in response to initial exposure to DBF at the age of budding feeding abilities supports a feeding imprinting hypothesis.  相似文献   

11.
Nambury S. Raju (1937–2005) developed two model‐based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures ( Raju, 1988 ) and Raju's DFIT ( Raju, van der Linden, & Fleer, 1995 ), are based on quantifying the gap between item characteristic functions (ICFs). This approach provides an intuitive and flexible methodology for assessing DIF. The purpose of this tutorial is to explain DFIT and show how this methodology can be utilized in a variety of DIF applications.  相似文献   

12.
13.
Abstract

In education research, statistical significance and effect size are 2 sides of 1 coin; they complement each other but they do not substitute for each other. Good research practice requires that, to make sound research decisions, both sides should be considered. In a simulation study, the sampling variability of 2 popular effect-size measures (d and R 2) was examined. The variability showed that what is statistically significant may not be practically meaningful, and what appears to be practically meaningful could have been the result of sampling error, thus not trustworthy. Some practical guidelines are suggested for combining the 2 sources of information in research practice.  相似文献   

14.
Performance on figure copying tasks is empirically linked to the school readiness, learning, cognition, and neuropsychological functioning. These nonverbal tasks are frequently used to evaluate children from diverse backgrounds to minimize bias due to factors such as language, ethnicity, culture, or socioeconomic status on test performance. The current study examined the possible Differential Item Functioning across African American and Caucasian groups, ages 4 to 7 years, in Bender Motor Gestalt Test, Second Edition (BG‐II) visual‐motor scores. Results indicated that in general the BG‐II can be considered invariant across these ethnic groups in this age range.  相似文献   

15.
ABSTRACT

Product design and development (PDD) is a current topic of academic and industrial research. Emphasis on innovation and entrepreneurship, as well as design thinking and creativity has been recently pulled together into the teaching and research on PDD. This paper looks into a multidisciplinary setting made up of three similar but independent PDD masters courses taught at three higher education institutions, having the same assessment, syllabus, assignments and outcomes. As expected, students’ projects foci are different. The outcomes of this experience were confronted with an ex-post literature review, which generated thorough guidelines that supported an innovative proposal for PDD education, to be implemented into an interdisciplinary Summer School. Significant generalisable contributions for educating modern engineers, designers and business entrepreneurs are expected, instead of just teaching methods of engineering, design and entrepreneurship at the case universities. The limitation of the used inductive reasoning concerns ‘truth’ being suggested but not assured.  相似文献   

16.
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across the studies, the differences in the estimated factor loadings between the two subgroups, resulting in a meta-analytic summary of the MGCFA effect sizes (MGCFA-ES). The performance of this new approach was examined using a Monte Carlo simulation, where we created 108 conditions by four factors: (1) three levels of item difficulty, (2) four magnitudes of DIF, (3) three levels of sample size, and (4) three types of correlation matrix (tetrachoric, adjusted Pearson, and Pearson). Results indicate that when MGCFA is fitted to tetrachoric correlation matrices, the meta-analytic summary of the MGCFA-ES performed best in terms of bias and mean square error values, 95% confidence interval coverages, empirical standard errors, Type I error rates, and statistical power; and reasonably well with adjusted Pearson correlation matrices. In addition, when tetrachoric correlation matrices are used, a meta-analytic summary of the MGCFA-ES performed well, particularly, under the condition that a high difficulty item with a large DIF was administered to a large sample size. Our result offers an option for synthesizing the magnitude of DIF on a flagged item across studies in practice.  相似文献   

17.
18.
The Autism Spectrum Quotient (AQ) scale was designed to detect the level and distribution of autistic-like traits across both the general population and those diagnosed with an Autism Spectrum Condition. Scores in the large normative samples were consistent with previous research in showing a continuous distribution of traits, implying that the Autism Spectrum might be viewed as one pole of an endless continuum rather than a distinct categorical condition. The preliminary research reported in this article looked at AQ levels in 75 random referrals to educational psychologists and identified a significantly elevated mean score with a large effect size (d = 1.46). The possible implications for practice and for further research are discussed.  相似文献   

19.
Although science education intends to help students learn to think, research in this area does not usually use psychological research on how people think. The purpose of this article is to describe one type of research, commonly called information-processing psychology. Its goal is understanding how people think while doing complex tasks. It uses detailed data, usually from individual subjects, and develops precise yet powerful models of human performance, often by using a computer. After describing information-processing research, we illustrate it with two studies. The first shows how computer models are used to explain thinking. A computer program models the knowledge needed to understand and use a physics textbook. The second study shows how information-processing approaches can be used systematically but more simply. This study clarifies why students find it so difficult to master the “factor-label” method for converting chemical units. The article concludes with a discussion of guidelines and suggestions for using information-processing ideas.  相似文献   

20.
Abstract

This study used a randomized field trial design to evaluate the efficacy of a research-based model for scaling up an intervention focused on preschool mathematics. Although the successes of research-based educational practices have been documented, equally well known is the paucity of successful efforts to bring these practices to scale. The same research corpus provides guidelines to scale up successful interventions. We designed an intervention model based on that research, including mathematics curricula with an emphasis on teaching for understanding following developmental guidelines, or learning trajectories, and using technology at multiple levels. We then implemented that model and evaluated the implementation with a limited scale up study. Within a design involving 25 classrooms serving children at risk for later school failure, we examined the impact of the model, using measures of fidelity of implementation, classroom observations of mathematics environment and teaching, and child outcomes. High levels of fidelity of implementation resulted in consistently higher scores in the intervention, compared to control, classes on the observation instrument and significantly and substantially greater gains in children's mathematics achievement in the intervention, compared to the control, children (effect size = .62).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号