首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Performance assessments appear on a priori grounds to be likely to produce far more local item dependence (LID) than that produced in the use of traditional multiple-choice tests. This article (a) defines local item independence, (b) presents a compendium of causes of LID, (c) discusses some of LID's practical measurement implications, (d) details some empirical results for both performance assessments and multiple-choice tests, and (e) suggests some strategies for managing LID in order to avoid negative measurement consequences.  相似文献   

2.
This article evaluates a procedure-based scoring system for a performance assessment (an observed paper towels investigation) and a notebook surrogate completed by fifth-grade students varying in hands-on science experience. Results suggested interrater reliability of scores for observed performance and notebooks was adequate (>.80) with the reliability of the former higher. In contrast, interrater agreement on procedures was higher for observed hands-on performance (.92) than for notebooks (.66). Moreover, for the notebooks, the reliability of scores and agreement on procedures varied by student experience, but this was not so for observed performance. Both the observed-performance and notebook measures correlated less with traditional ability than did a multiple-choice science achievement test. The correlation between the two performance assessments and the multiple-choice test was only moderate (mean = .46), suggesting that different aspects of science achievement have been measured. Finally, the correlation between the observed-performance scores and the notebook scores was .83, suggesting that notebooks may provide a reasonable, albeit less reliable, surrogate for the observed hands-on performance of students.  相似文献   

3.
A rapidly expanding arena for item response theory (IRT) is in attitudinal and health‐outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed‐form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q3 Statistic and Pearson's Statistic X2, in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient‐Reported Outcomes Measurement Information System (PROMIS).  相似文献   

4.
《教育实用测度》2013,26(2):175-199
This study used three different differential item functioning (DIF) detection proce- dures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify factors (e.g., content, cognitive processes, differences in ability distributions, etc.) that may be related to DIF. The QUASAR (Quantitative Under- standing: Amplifying Student Achievement and Reasoning) Cognitive Assessment Instrument (QCAI) is designed to measure students' mathematical thinking and reasoning skills and consists of open-ended items that require students to show their solution processes and provide explanations for their answers. In this study, 33 polytomously scored items, which were distributed within four test forms, were evaluated with respect to gender-related DIF. The data source was sixth- and seventh- grade student responses to each of the four test forms administrated in the spring of 1992 at all six school sites participatingin the QUASARproject. The sample consisted of 1,782 students with approximately equal numbers of female and male students. The results indicated that DIF may not be serious for 3 1 of the 33 items (94%) in the QCAI. For the two items that were detected as functioning differently for male and female students, several plausible factors for DIF were discussed. The results from the secondary analyses, which removed the mutual influence of the two items, indicated that DIF in one item, PPPl, which favored female students rather than their matched male students, was of particular concern. These secondary analyses suggest that the detection of DIF in the other item in the original analysis may have been due to the influence of Item PPPl because they were both in the same test form.  相似文献   

5.
6.
Research in Science Education - Informal formative assessments (IFAs) are classroom interactions teachers use to gather information about their students’ learning, interpret it, and act on...  相似文献   

7.
How can we best extend DIF research to performance assessment? What are the issues and problems surrounding studies of DIF on complex tasks? What appear to be the best approaches at this time?  相似文献   

8.
Research in Science Education - Information on students’ development of science skills is essential for teachers to evaluate and improve their own education, as well as to provide adequate...  相似文献   

9.
叶萌 《考试研究》2010,(2):96-107
本文对项目反应理论(IRT)局部独立性问题的主要研究成果进行了文献梳理。在此基础上,阐释局部独立性假设的定义。文章同时就局部独立性与测验维度的关系,局部依赖的甄别与计算、起因和控制程序,以及局部依赖对测量实践的影响进行讨论,并探讨了题组中局部题目依赖问题的解决策略。  相似文献   

10.
Given the relationships of item response theory (IRT) models to confirmatory factor analysis (CFA) models, IRT model misspecifications might be detectable through model fit indexes commonly used in categorical CFA. The purpose of this study is to investigate the sensitivity of weighted least squares with adjusted means and variance (WLSMV)-based root mean square error of approximation, comparative fit index, and Tucker–Lewis Index model fit indexes to IRT models that are misspecified due to local dependence (LD). It was found that WLSMV-based fit indexes have some functional relationships to parameter estimate bias in 2-parameter logistic models caused by violations of LD. Continued exploration into these functional relationships and development of LD-detection methods based on such relationships could hold much promise for providing IRT practitioners with global information on violations of local independence.  相似文献   

11.
因果概念和解释是学科知识中最重要的内容。运用韩礼德系统功能语法理论分析大学双语课上所使用的教材文本语篇的特点,以及教师对教材中的因果性关系进行解释时所使用的词汇语法结构。结果发现:为了帮助学生更好地理解教材文本语篇因果性解释中语法隐喻的意义,教师使用了一系列下行性功能重塑语,即把高级复杂的语法结构转换成简单的一致式语法结构。此发现对中国双语教学具有一定的启示。  相似文献   

12.
There is a tendency for lecture-based instruction in large introductory science courses to strongly focus on the delivery of discipline-specific technical terminology and fundamental concepts, sometimes to the detriment of opportunities for application of learned knowledge in evidence-based critical-thinking activities. We sought to improve student performance on evidence-based critical-thinking tasks through the implementation of peer learning and problem-based learning tutorial activities. Small-group discussions and associated learning activities were used to facilitate deeper learning through the application of new knowledge. Student performance was assessed using critical-thinking essay assignments and a final course exam, and student satisfaction with tutorial activities was monitored using online surveys. Overall, students expressed satisfaction with the small-group-discussion-based tutorial activities (mean score 7.5/10). Improved critical thinking was evidenced by improved student performance on essay assignments during the semester, as well as a 25% increase in mean student scores on the final course exam compared to previous years. These results demonstrate that repeated knowledge application practice can improve student learning in large introductory-level science courses.  相似文献   

13.
本文从概念框架、作业形式和评分系统三个方面对IEA国际科学教育研究中的实作评量进行了综述,从而探讨国际科学教育评价中实作评量的发展趋势.  相似文献   

14.
One of the ways of controlling for the influence of social expectations on the answers given by survey respondents is to use a social desirability scale together with the main questions. The social desirability scale, which was included in the Teaching and Learning International Survey (TALIS) international comparative study for this purpose, was used on a Russian-language sample of teachers without cross-cultural adaptation. In addition, this tool was based on the Marlowe-Crowne Social Desirability Scale, whose psychometric characteristics have only been evaluated so far within the framework of classical test theory with mixed results. In order to fill the gap in our understanding of the validity of the TALIS social desirability scale within the framework of item response theory, we analyzed the data obtained from a representative sample of Russian teachers. The results showed that the scale had acceptable reliability, significant unidimensionality, and, at the same time, a number of serious problems with its functionality. We propose measures to improve the quality of the psychometric properties of the scale on the basis of the obtained results, including simulated data. We draw fundamental conclusions about the structure of the social desirability construct.  相似文献   

15.
周边绩效理论是最近几年掀起的一个研究热点.周边绩效涉及员工职责范围外自愿从事的有利于组织和他人的一切活动,对企业的绩效提升独立地起作用.企业各级管理者应积极地将周边绩效理论应用于绩效管理实践中,充分发挥员工周边绩效行为的积极作用.  相似文献   

16.
Preservice teachers in a science methods course were provided instruction on performance assessment, then guided through a design and implementation process of performance assessment tasks. We assessed the effect of designing and implementing a performance assessment task on preservice teachers' understanding of standards-based assessment. The findings show that these preservice teachers improved in their understanding of assessment as a formative process as well as their science content understanding of the topic addressed in their designed task. We found that preservice teachers need to experiment with performance assessment tasks in an authentic context in order to understand the full potential and value of the task.  相似文献   

17.
全面建设小康社会理论,以其对当代中国社会主义现代化建设的基本规律的深刻把握,表征出其社会关怀的科学维度;把实现人的全面发展确立为全面建设小康社会的价值目标,表征出其社会关怀的价值维度;把实现先进生产力发展要求当作推动社会进步和人的全面发展的基本路径,表征出其社会关怀科学维度与价值维度的高度统一。  相似文献   

18.
近年来,为了从组织人事方面深入贯彻落实科学发展观,各地积极开展党政领导班子和领导干部绩效考评探索活动.这些活动取得了可喜的成绩,同时也存在不少问题亟待解决.建议出台统一的地方绩效考评法规,健全地方绩效考评指标体系,建立科学的地方绩效考评程序,完善地方绩效考评主体,塑造绩效文化.  相似文献   

19.
With the concurrent emphasis on accountability, prevention, and early intervention, curriculum-based measurement of reading (R-CBM) is playing an increasingly important role in the educational process. This study investigated the differences in diagnostic accuracy and utility between commercial norms and local norms when making high-stakes, local decisions. Scores on Dynamic Indicators of Early Literacy Skills Oral Reading Fluency for 1,374 students in Grades 2 to 5 were used to predict outcomes the Georgia reading achievement test, the Criterion Referenced Competency Tests. Local norms were generated using logistic regression and receiver operator characteristic curve analysis. The generated cut scores were compared to the commercial norms for differences in diagnostic efficiency. The generated cut scores were lower than the commercial norms and had improved diagnostic efficiency. Implications related to educational policy and the use of R-CBM are discussed.  相似文献   

20.
张军 《考试研究》2014,(1):56-61
单调匀质模型是非参数项目反应理论中使用最广泛的模型,它有三个基本假设,适用于小规模测验的分析。本研究使用MHM分析北京语言大学汉语进修学院某次测验,结果表明测验满足弱单维性假设与弱局部独立性假设,67个项目中有9个项目的量表适宜性系数低于0.3,需要修改或删除,删除后测验为中等强度的Mokken量表。另外,有2个项目违反了单调性假设,不符合Mokken量表的要求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号