首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.  相似文献   

2.
Word problems for English language learners (ELLs) at risk for math disabilities are challenging in terms of the constant need to develop precise math language and comprehension knowledge. As a result of this, ELLs may not only need math support but also reading and linguistic support. The purpose of this study was to assess the effectiveness of a word problem–solving strategy called Estratégica Dinámica de Matemáticas (EDM). This strategy was designed to provide math support in the native language based on students' math comprehension levels. A changing criterion multiple baseline design was used to instruct six second-grade Latino ELLs at risk for math disability. As compared with the baseline phase, EDM increased word problem solving for all participants. All students' level of performance were maintained and generalized during follow-up sessions. This study has implications for a native language intervention that focuses on strategy training to facilitate word problem–solving performance.  相似文献   

3.
The purpose of the present study is to examine the language characteristics of a few states' large-scale assessments of mathematics and science and investigate whether the language demands of the items are associated with the degree of differential item functioning (DIF) for English language learner (ELL) students. A total of 542 items from 11 assessments at Grades 4, 5, 7, and 8 from three states were rated for the linguistic complexity based on a developed linguistic coding scheme. The linguistic ratings were compared to each item's DIF statistics. The results yielded a stronger association between the linguistic rating and DIF statistics for ELL students in the “relatively easy” items than in the “not easy” items. Particularly, general academic vocabulary and the amount of language in an item were found to have the strongest association with the degrees of DIF, particularly for ELL students with low English language proficiency. Furthermore, the items were grouped into four bundles to closely look at the relationship between the varying degrees of language demands and ELL students' performance. Differential bundling functioning (DBF) results indicated that the exhibited DBF was more substantial as the language demands increased. By disentangling linguistic difficulty from content difficulty, the results of the study provide strong evidence of the impact of linguistic complexity on ELL students' performance on tests. The study discusses the implications for the validation of the tests and instructions for ELL students.  相似文献   

4.
Diversity and heterogeneity among language groups have been well documented. Yet most fairness research that focuses on measurement comparability considers linguistic minority students such as English language learners (ELLs) or Francophone students living in minority contexts in Canada as a single group. Our focus in this research is to examine the degree to which measurement comparability, as indicated by differential item functioning (DIF), is consistent for sub-groups among linguistic minority Francophone students in Canada. The findings suggest that the linguistic minority Francophone students who speak French at home and those who do not speak French at home should not be grouped together for investigating measurement comparability or for examining performance gaps. We identified a great degree of differences in DIF identification with a consistency of 7–10% in DIF identification in the separate analyses for the two groups. The findings highlight methodological problems with investigating fairness for diverse linguistic groups that are treated as a single group.  相似文献   

5.
Heterogeneity within English language learners (ELLs) groups has been documented. Previous research on differential item functioning (DIF) analyses suggests that accurate DIF detection rates are reduced greatly when groups are heterogeneous. In this simulation study, we investigated the effects of heterogeneity within linguistic (ELL) groups on the accuracy of DIF detection. Heterogeneity within such groups may occur for a myriad of reasons including differential lengths of time residing in English-speaking countries, degrees of exposure to English-speaking environments, and amounts of English instruction. Our findings revealed that at high levels of within-group heterogeneity, DIF detection is at the level of chance, implying that a large proportion of DIF items might remain undetected when assessing heterogeneous populations potentially leading to developing biased tests. Based on our findings, we urge test development organizations to consider heterogeneity within ELL and other heterogeneous focus groups in their routine DIF analyses.  相似文献   

6.
Identifying the Causes of DIF in Translated Verbal Items   总被引:1,自引:0,他引:1  
Translated tests are being used increasingly for assessing the knowledge and skills of individuals who speak different languages. There is little research exploring why translated items sometimes function differently across languages. If the sources of differential item functioning (DIF) across languages could be predicted, it could have important implications on test development, scoring and equating. This study focuses on two questions: “Is DIF related to item type?”, “What are the causes of DIF?” The data were taken from the Israeli Psychometric Entrance Test in Hebrew (source) and Russian (translated). The results indicated that 34% of the items functioned differentially across languages. The analogy items were the most problematic with 65% showing DIF, mostly in favor of the Russian-speaking examinees. The sentence completion items were also a problem (45% D1F). The main reasons for DIF were changes in word difficulty, changes in item format, differences in cultural relevance, and changes in content.  相似文献   

7.
Several studies have shown that the linguistic complexity of items in achievement tests may cause performance disadvantages for second language learners. However, the relative contributions of specific features of linguistic complexity to this disadvantage are largely unclear. Based on the theoretical concept of academic language, we used data from a state-wide test in mathematics for third graders in Berlin, Germany, to determine the interrelationships among several academic language features of test items and their relative effects on differential item functioning (DIF) against second language learners. Academic language features were significantly correlated with each other and with DIF. While we found text length, general academic vocabulary, and number of noun phrases to be unique predictors of DIF, substantial proportions of the variance in DIF were explained by confounded combinations of several academic language features. Specialised mathematical vocabulary was neither related to DIF nor to the other academic language features.  相似文献   

8.
We contend that generalizability (G) theory allows the design of psychometric approaches to testing English-language learners (ELLs) that are consistent with current thinking in linguistics. We used G theory to estimate the amount of measurement error due to code (language or dialect). Fourth- and fifth-grade ELLs, native speakers of Haitian-Creole from two speech communities, were given the same set of mathematics items in the standard English and standard Haitian-Creole dialects (Sample 1) or in the standard and local dialects of Haitian-Creole (Samples 2 and 3). The largest measurement error observed was produced by the interaction of student, item, and code. Our results indicate that the reliability and dependability of ELL achievement measures is affected by two facts that operate in combination: Each test item poses a unique set of linguistic challenges and each student has a unique set of linguistic strengths and weaknesses. This sensitivity to language appears to take place at the level of dialect. Also, students from different speech communities within the same broad linguistic group may differ considerably in the number of items needed to obtain dependable measures of their academic achievement. Whether students are tested in English or in their first language, dialect variation needs to be considered if language as a source of measurement error is to be effectively addressed.  相似文献   

9.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

10.
《教育实用测度》2013,26(4):341-351
The relation between characteristics of test takers and characteristics of items was examined in a quasi-experimental study. High-school sophomores and juniors were administered a mathematics exam that was of consequence to the sophomores but not the juniors. The juniors had more mathematics course work as a group but less motivation to perform well. Items were characterized by item difficulty (from p values), the degree to which they were mentally taxing (how much mental effort was necessary to reach a correct answer), and item position (as an index of the level of fatigue of the test taker). A differential item functioning (DIE) analysis was conducted to look at differences between sophomores and juniors on an item-by-item basis. It was found that all three item characteristic measures were related to the DIF index, with the mental taxation measure showing the strongest relation. Results are interpreted in relation to the expectancy value model of motivation as formulated by Pintrich (1988, 1989).  相似文献   

11.
A 1998 study by Bielinski and Davison reported a sex difference by item difficulty interaction in which easy items tended to be easier for females than males, and hard items tended to be harder for females than males. To extend their research to nationally representative samples of students, this study used math achievement data from the 1992 NAEP, the TIMSS, and the NELS:88. The data included students in grades 4, 8, 10, and 12. The interaction was assessed by correlating the item difficulty difference (bmale− bfemale) with item difficulty computed on the combined male/female sample. Using only the multiple-choice mathematics items, the predicted negative correlation was found for all eight populations and was significant in five. An argument is made that this phenomenon may help explain the greater variability in math achievement among males as compared to females and the emergence of higher performance of males in late adolescence.  相似文献   

12.
This Monte Carlo study examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure. Data were generated using a 3-parameter logistic item response theory model according to the balanced incomplete block (BIB) design used in the National Assessment of Educational Progress (NAEP). The length of each block of items and the number of DIF items in the matching variable were varied, as was the difficulty, discrimination, and presence of DIF in the studied item. Block, booklet, pooled booklet, and extra-information analyses were compared to a complete data analysis using the transformed log-odds on the delta scale. The pooled booklet approach is recommended for use when items are selected for examinees according to a BIB design. This study has implications for DIF analyses of other complex samples of items, such as computer administered testing or another complex assessment design.  相似文献   

13.
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across the studies, the differences in the estimated factor loadings between the two subgroups, resulting in a meta-analytic summary of the MGCFA effect sizes (MGCFA-ES). The performance of this new approach was examined using a Monte Carlo simulation, where we created 108 conditions by four factors: (1) three levels of item difficulty, (2) four magnitudes of DIF, (3) three levels of sample size, and (4) three types of correlation matrix (tetrachoric, adjusted Pearson, and Pearson). Results indicate that when MGCFA is fitted to tetrachoric correlation matrices, the meta-analytic summary of the MGCFA-ES performed best in terms of bias and mean square error values, 95% confidence interval coverages, empirical standard errors, Type I error rates, and statistical power; and reasonably well with adjusted Pearson correlation matrices. In addition, when tetrachoric correlation matrices are used, a meta-analytic summary of the MGCFA-ES performed well, particularly, under the condition that a high difficulty item with a large DIF was administered to a large sample size. Our result offers an option for synthesizing the magnitude of DIF on a flagged item across studies in practice.  相似文献   

14.
As students enter the upper elementary grades, word problems become a main component of mathematics instruction, increasing in complexity as students advance through the curriculum. For students identified as emergent bilinguals with mathematics difficulty (MD), the linguistic complexity inherent in word problems may serve as a barrier to word-problem proficiency. The current study investigated the potential relation between academic English proficiency and word-problem outcomes for emergent bilinguals with MD. After analyzing data from 241 third-grade students, results indicated students who participated in an evidence-based word-problem intervention outperformed students who did not receive the intervention. Moreover, students’ academic English-language proficiency scores in the domains of reading and writing positively correlated with higher scores on a measure of word-problem solving.  相似文献   

15.
《教育实用测度》2013,26(4):291-312
This study compares three procedures for the detection of differential item functioning (DIF) under item response theory (IRT): (a) Lord's chi-square, (b) Raju's area measures, and (c) the likelihood ratio test. Relations among the three procedures and some practical considerations, such as linking metrics and scale purification, are discussed. Data from two forms of a university mathematics placement test were analyzed to examine the congruence among the three procedures. Results indicated that there was close agreement among the three DIF detection procedures.  相似文献   

16.
Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in different ways by item difficulty, item discrimination, and item lower asymptote. For example, for a fixed discrimination, the difference in probabilities decreases as the difference between the item difficulty and the mean ability increases. Under the same conditions, the log of the odds-ratio remains constant if the lower asymptote is zero. A non-zero lower asymptote decreases the absolute value of the probability difference symmetrically for easy and hard items, but it decreases the absolute value of the log-odds difference much more for difficult items. Thus, one cannot set a criterion for defining a large effect size in one metric and find a corresponding criterion in another metric that is equivalent across all items or ability distributions. In choosing an effect size, these differences must be understood and considered.  相似文献   

17.
Connectives (e.g., although, meanwhile) carry abstract meanings and often signal key relationships between text ideas. This study explored whether understanding of connectives represents a unique domain of vocabulary knowledge that provides special leverage for reading comprehension, and whether the contribution of knowledge of connectives to reading comprehension differs for students from distinct language backgrounds. Understanding of connectives, word reading efficiency and breadth of vocabulary knowledge of 75 English language learners (ELLs) and 75 English‐only (EO) fifth graders were assessed. Hierarchical multiple regression techniques revealed that understanding of connectives explained a sizeable and significant portion of unique variance in comprehension beyond that explained by breadth of vocabulary knowledge when controlling for word reading efficiency. The magnitude of this relationship was larger for EO students than for ELLs. Findings indicate that connectives play an important role in comprehension, but that the strength of their influence varies by readers’ linguistic background.  相似文献   

18.
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P‐difference and unsigned weighted P‐difference. The performance of the effect size measures was investigated under various simulation conditions including different sample sizes and DIF magnitudes. As another way of studying DIF, the χ2 difference test was included to compare the result of statistical significance (statistical tests) with that of practical significance (effect size measures). The adequacy of existing effect size criteria used in unidimensional tests was also evaluated. Both effect size measures worked well in estimating true effect sizes, identifying DIF types, and classifying effect size categories. Finally, a real data analysis was conducted to support the simulation results.  相似文献   

19.
In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored.  相似文献   

20.
This article demonstrates the utility of restricted item response models for examining item difficulty ordering and slope uniformity for an item set that reflects varying cognitive processes. Twelve sets of paired algebra word problems were developed to systematically reflect various types of cognitive processes required for successful performance. This resulted in a total of 24 items. They reflected distance-rate–time (DRT), interest, and area problems. Hypotheses concerning difficulty ordering and slope uniformity for the items were tested by constraining item difficulty and discrimination parameters in hierarchical item response models. The first set of model comparisons tested the equality of the discrimination and difficulty parameters for each set of paired items. The second set of model comparisons examined slope uniformity within the complex DRT problems. The third set of model comparisons examined whether the familiarity of the story context affected item difficulty for two types of complex DRT problems. The last set of model comparisons tested the hypothesized difficulty ordering of the items.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号