首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In this study, the effectiveness of detection of differential item functioning (DIF) and testlet DIF using SIBTEST and Poly-SIBTEST were examined in tests composed of testlets. An example using data from a reading comprehension test showed that results from SIBTEST and Poly-SIBTEST were not completely consistent in the detection of DIF and testlet DIF. Results from a simulation study indicated that SIBTEST appeared to maintain type I error control for most conditions, except in some instances in which the magnitude of simulated DIF tended to increase. This same pattern was present for the Poly-SIBTEST results, although Poly-SIBTEST demonstrated markedly less control of type I errors. Type I error control with Poly-SIBTEST was lower for those conditions for which the ability was unmatched to test difficulty. The power results for SIBTEST were not adversely affected, when the size and percent of simulated DIF increased. Although Poly-SIBTEST failed to control type I errors in over 85% of the conditions simulated, in those conditions for which type I error control was maintained, Poly-SIBTEST demonstrated higher power than SIBTEST.  相似文献   

In international large-scale assessments of educational outcomes, student achievement is often represented by unidimensional constructs. This approach allows for drawing general conclusions about country rankings with respect to the given achievement measure, but it typically does not provide specific diagnostic information which is necessary for systematic comparisons and improvements of educational systems. Useful information could be obtained by exploring the differences in national profiles of student achievement between low-achieving and high-achieving countries. In this study, we aimed to identify the relative weaknesses and strengths of eighth graders’ physics achievement in Bosnia and Herzegovina in comparison to the achievement of their peers from Slovenia. For this purpose, we ran a secondary analysis of Trends in International Mathematics and Science Study (TIMSS) 2007 data. The student sample consisted of 4,220 students from Bosnia and Herzegovina and 4,043 students from Slovenia. After analysing the cognitive demands of TIMSS 2007 physics items, the correspondent differential item functioning (DIF)/differential group functioning contrasts were estimated. Approximately 40% of items exhibited large DIF contrasts, indicating significant differences between cultures of physics education in Bosnia and Herzegovina and Slovenia. The relative strength of students from Bosnia and Herzegovina showed to be mainly associated with the topic area ‘Electricity and magnetism’. Classes of items which required the knowledge of experimental method, counterintuitive thinking, proportional reasoning and/or the use of complex knowledge structures proved to be differentially easier for students from Slovenia. In the light of the presented results, the common practice of ranking countries with respect to universally established cognitive categories seems to be potentially misleading.  相似文献   

BackgroundThe accurate assessment of childhood maltreatment (CM) is important in medical and mental health settings given its association to adverse psychological and physical outcomes. Reliable and valid assessment of CM is also of critical importance to research. Due to the potential of measurement bias when comparing CM across racial and ethnic groups, invariant measurement is an important psychometric property of such screening tools.ObjectiveIn this study, differential item function (DIF) by race and ethnicity was tested. Uniform DIF refers to the influence of bias on scores across all levels of childhood maltreatment, and non-uniform DIF refers to bias in favor of one group.MethodParticipants were N=1,319 women and men (Mage=36.77, SDage=10.37) who completed the Child Trauma Questionnaire-Short Form; 42.7% were women, 57.3% were male; 58.9% were White-American, 22.1% Black-American, and 8.0% as other; 26.3% were Hispanic.ResultsUsing empirical thresholds, non-uniform DIF was identified in five items by race, and no items by ethnicity.ConclusionsUniform DIF is less problematic given that mathematical corrections can be made to adjust scores for DIF. However, non-uniform DIF can usually only be corrected by removing the DIF items from the scale. Further methodological research is needed to minimize measurement bias to effectively assess racially diverse populations.  相似文献   

Several studies have shown that the linguistic complexity of items in achievement tests may cause performance disadvantages for second language learners. However, the relative contributions of specific features of linguistic complexity to this disadvantage are largely unclear. Based on the theoretical concept of academic language, we used data from a state-wide test in mathematics for third graders in Berlin, Germany, to determine the interrelationships among several academic language features of test items and their relative effects on differential item functioning (DIF) against second language learners. Academic language features were significantly correlated with each other and with DIF. While we found text length, general academic vocabulary, and number of noun phrases to be unique predictors of DIF, substantial proportions of the variance in DIF were explained by confounded combinations of several academic language features. Specialised mathematical vocabulary was neither related to DIF nor to the other academic language features.  相似文献   

Reading is a key competence for knowledge acquisition and learning processes. One important source of reading motivation is interest. Even though students' text-based interest often differs by gender, it remains unclear which text factors underlie these differences and whether text-based interest relates to reading comprehension among boys and girls. In a sample of 514 elementary students (47.2% girls), this study examined whether text topic, protagonists' gender, and text difficulty affect boys' and girls' text-based interest and whether interest and reading comprehension are intertwined. Based on a repeated within-subject design using fourteen narrative texts, the results indicated that boys' interest was higher in texts with male-attributed topics, male protagonists, and in more difficult texts. In contrast, girls’ interest was only affected by text difficulty. Text-based interest and reading comprehension were significantly related, albeit stronger for boys than for girls. The findings are discussed regarding future implications for research and educational practice.  相似文献   

The aim of this study was to apply Rasch modeling to an examination of the psychometric properties of the Pearson Test of English Academic (PTE Academic). Analyzed were 140 test-takers' scores derived from the PTE Academic database. The mean age of the participants was 26.45 (SD = 5.82), ranging from 17 to 46. Conformity of the participants' performance on the 86 items of PTE Academic Form 1 of the field test was evaluated using the partial credit model. The person reliability coefficient was .96, and item reliability was .99. The results showed that no significant differential item functioning was found across subgroups of gender and spoken-language context, indicating that the item data approximated the Rasch model. The findings of this study validated the test stability of PTE Academic as a useful measurement tool for English language learners' academic English assessment.  相似文献   

通过厘清美术加试学生的类别,理性校准美术教育的定位,积极探索建构指向素养的加试方案,为美术特色高中选拔更加优秀的人才。主要建议包括调整美术加试的内容、改变美术加试的形式及完善美术加试的评价。  相似文献   

高考英语听力测试是通过要求考生听录音然后回答试卷上试题的形式来实现的。近年来,高考英语听力测试坚持改革与创新,在形式与方法上均有了新的拓展。高考英语听力测试要进一步改革测试形式和优化题型结构,回应高中英语课程改革的诉求与高考英语听力测试未来发展趋势的诉求,体现公平公正的高考改革理念和当代英语语言测试的要求。  相似文献   

徐泓 《考试研究》2022,(1):21-29
情境是试题的要素之一,其质量优劣直接影响试题的品质。基于试题情境的文献分析,初步构建中考化学试题情境质量的分析框架,通过专家咨询和效度检验形成包括5个维度、15种水平的化学试题情境质量评估框架,并以安徽省2021年中考化学试卷第16题为例进行评估,对教学与评价提出3点建议。  相似文献   

民国时期的英语测试及评价   总被引:1,自引:0,他引:1  
本文从史料出发探讨民国时期的英语测试状况,关注的教学机构为教会学校。作者首先介绍民国时期英语教学的基本状况,解析当时英语测试的理论背景,再现其形式,讨论其用途,最后进行评价。结语指出,尽管民国时期在语言测试史上属于科学前时期,教会学校的英语教师们在语言测试上做出的积极谨慎的努力对今天仍有借鉴意义。  相似文献   

英语高考试行"一年多考"是一项了不起的进步,但多次考试之间的难度波动往往会给直接使用原始分数做招生决定带来极大的麻烦。本文探讨了稳定测验难度的三种方法:国际考试行业的标准做法、借用标准设定思想的专家评定方法,以及反向使用效度证据的小规模代表性样本试测方法。期待这些方法可以给考试一线工作者提供更多的选择。  相似文献   

本探讨了学生在进行英语阅读时存在的心理误区,并阐述了心理激励在英语阅读中的有效运用。  相似文献   

本文从2009年高考语文考试大纲的要求着手,对近三年重点是2009年高考语文试题的汉字考查情况作了调查研究。文章以《现代汉语通用字表》的7000字(其中包含2500个常用字和1000个次常用字)为标准,对试题中进行了汉字考查的汉字专项考查、默写及作文三种题型作了分析研究和详细的数据整合,从字形、字音及考查频次等方面进行了比较分析,以期对高考语文汉字考查的方向和趋势作一探索。  相似文献   

真实性原则在语言测试中的体现程度是检验语言测试好坏的一项重要标准.只有真实性原则得到充分体现的测试,才能更准确地考查出考生的实际语言交际能力.通过对比和分析全国高考英语写作试题在测试文本,测试任务和测试情景三方面的真实性,以期了解真实性原则在近年高考英语写作测试中的体现程度及其对英语教学的反拨作用.  相似文献   

There have been significant changes in the racial/ethnic and linguistic background of students attending public schools in the United States. The number of public‐school students who are English language learners (ELLs) participating in programs of language assistance has more than doubled over the past two decades. In 1993–1994, 5.1% of public‐school students in the United States were ELLs, or an estimated 2.1 million students. As of 2014–2015, 9.4% of students were ELLs, or an estimated 4.6 million students. It is estimated that by 2030, upward of 40% of school children will speak English as a second language. Meeting the needs of students who are not proficient in English is challenging for school professionals and even more so if they are identified for special services. Researchers have found that ELL students live in situations with numerous high‐risk factors, including poverty, inadequate schools, poor and violent neighborhoods, and limited access to adequate health care, mental health services, and schools. As a group, these students are more likely to underperform academically, have a lower grade point average, and drop out of school compared to non‐ELL Latino students.  相似文献   

Researchers interested in exploring substantive group differences are increasingly attending to bundles of items (or testlets): the aim is to understand how gender differences, for instance, are explained by differential performances on different types or bundles of items, hence differential bundle functioning (DBF). Some previous work has modelled hierarchies in data in this context or considered item responses within persons, but here we model the bundles themselves as explanatory variables at the item level potentially explaining significant intra-class correlation due to gender differences in item difficulty, and thus explaining variation at the second item level. In this study, we analyse DBF using single- and two-level models (the latter modelling random item effects, which models responses at Level 1 and items at Level 2) in a high-stakes National Mathematics test. The models show comparable regression coefficients but the statistical significances of the two-level models are smaller due to the larger values of the estimated standard errors. We discuss the contrasting relevance of this effect for test developers and gender researchers.  相似文献   

从2004年9月开始,山东、广东、海南与宁夏4省(区)的高一新生率先进行普通高中新课程实验,从而拉开了我国普通高中新课程改革的帷幕。在此背景下的新课程高考改革相继在各实验省份全面展开。高考命题是新课程改革的重要内容,其中能力立意取代知识立意成为高考命题的重要指导思想。本文以新课程改革、《考试大纲》和测量学方面的要求及物理实验能力自身的特点为基础,论述了新课程高考物理实验能力考查的策略和面临的挑战。  相似文献   

对取消高考英语听力测试题的探讨   总被引:1,自引:0,他引:1  
从2005年起,浙江、陕西等省,陆续取消了高考英语听力测试,这将十分不利于学生全面地掌握语言技能,不利于奠定学生终身学习英语的良好基础。通过对此举进行反思,探讨高考英语听力测试存在的合理性,并提出建议,可为高考英语改革提供启示。  相似文献   

文章以郴州职业技术学院学生通过英语应用能力考试(以下简称过级)的调查为例,分析了学生对英语学习的重视程度、学习方法、所用时间、对待考试的态度,对考试的了解程度、学习中存在的问题、过级的障碍等方面的现状,认为提高学生英语过级率只有在大纲的指导下,以教材为中心,使学生在听说读写译上得到较大的提高,才能真正提高学生的英语水平,并有助于消除高分低能的不正常现象。  相似文献   

民族地区教育是我国教育事业的重要组成部分,也是民族工作的重要内容。但是由于我院地处彝族地区,学生大多来自偏远地区,生源贫困面大,教育基础参差不齐,给大学英语教学工作——尤其是阅读教学增加了极大的困难。本文从分析制约彝族地区大学英语阅读教学的主要因素人手,并针对以上因素提出了相应的对策,以期对我院学生阅读能力的提高有所帮助。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号