首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
介绍了在教育测量和评估中运用多元概化理论的优势,使用多元概化理轮的过程步骤,以及能够提供给研究者的各种结论和信息。认为可以帮助初学者了解多元概化理论,并为使用者提供了使用方法。  相似文献   

本文在简单介绍测量学中古典论和概推度论的基础上,对二者各自的特点进行了比较。结合模拟案例,以误差的多来源问题为切入点,着重探讨了古典论的局限与概推度论相应的优势,以及两种理论之间本质上的差异。  相似文献   

We discuss generalizability (G) theory and the fair and valid assessment of linguistic minorities, especially emergent bilinguals. G theory allows examination of the relationship between score variation and language variation (e.g., variation of proficiency across languages, language modes, and social contexts). Studies examining score variation across items administered in emergent bilinguals' first and second languages show that the interaction of student and the facets (sources of measurement error) item and language is an important source of score variation. Each item poses a unique set of linguistic challenges in each language, and each emergent bilingual individual has a unique set of strengths and weaknesses in each language. Based on these findings, G theory can inform the process of test construction in large-scale testing programmes and the development of testing models that ensure more valid and fair interpretations of test scores for linguistic minorities.  相似文献   

改进普通话测试的概化理论分析   总被引:4,自引:0,他引:4  
根据概化理论的原理和方法,讨论了普通话测试的测验设计问题。研究发现,国家语委实施的普通话测验从总体上说具有较高的信度。其中,测验的第一和第二部分的信度要高一些,而第三部分的信度要低一些。改进测验设计的首要方法是提高分测验三的信度。具体来说,评分者个数为2、题目数量为25是一个最低要求的设计方案。若要求信度值较高(如Eρ^2=0.60以上),则宜选择评分者个数为2、题目个数接近50,或评分者个数为3、题目个数大于30的测量设计。  相似文献   

概化理论在结构化面试评分误差中的应用研究   总被引:1,自引:0,他引:1  
应用概化理论对结构化面试的评分误差的控制问题进行了研究。结果表明:结构化面试评分能够较好地反映出被试的真实能力水平,评分具有较高的信度;在保证较高的面试评分信度(0.80)的情况下,建议将考官人数减少至9名,以提高结构化面试的经济性和效率性。  相似文献   

Course quality is multifaceted, being determined by instructor, students, and external conditions. Consequently, any attempt at measurement should reflect this diversity, so that stable evaluations can be made that reflect both personal (instructor) and situational (student and external conditions) variables. This study extends previous research by examining the stability of both dimensions across different courses, student populations, and universities. In addition, the sample (N = 692 courses) was drawn from 6 traditional and technical German universities that have a different ethos of student interaction with academic staff than those in many other Western countries. Using the Heidelberg Inventory, it was found that instructor variables were reliable across courses given by the same instructor, but student scales or background variables were less consistent across courses in which the content was identical. It was concluded that the instrument was both reliable and valid for student evaluations of both teaching performance and course quality within a European context.  相似文献   

We contend that generalizability (G) theory allows the design of psychometric approaches to testing English-language learners (ELLs) that are consistent with current thinking in linguistics. We used G theory to estimate the amount of measurement error due to code (language or dialect). Fourth- and fifth-grade ELLs, native speakers of Haitian-Creole from two speech communities, were given the same set of mathematics items in the standard English and standard Haitian-Creole dialects (Sample 1) or in the standard and local dialects of Haitian-Creole (Samples 2 and 3). The largest measurement error observed was produced by the interaction of student, item, and code. Our results indicate that the reliability and dependability of ELL achievement measures is affected by two facts that operate in combination: Each test item poses a unique set of linguistic challenges and each student has a unique set of linguistic strengths and weaknesses. This sensitivity to language appears to take place at the level of dialect. Also, students from different speech communities within the same broad linguistic group may differ considerably in the number of items needed to obtain dependable measures of their academic achievement. Whether students are tested in English or in their first language, dialect variation needs to be considered if language as a source of measurement error is to be effectively addressed.  相似文献   

Contemporary educational accountability systems, including state‐level systems prescribed under No Child Left Behind as well as those envisioned under the “Race to the Top” comprehensive assessment competition, rely on school‐level summaries of student test scores. The precision of these score summaries is almost always evaluated using models that ignore the classroom‐level clustering of students within schools. This paper reports balanced and unbalanced generalizability analyses investigating the consequences of ignoring variation at the level of classrooms within schools when analyzing the reliability of such school‐level accountability measures. Results show that the reliability of school means cannot be determined accurately when classroom‐level effects are ignored. Failure to take between‐classroom variance into account biases generalizability (G) coefficient estimates downward and standard errors (SEs) upward if classroom‐level effects are regarded as fixed, and biases G‐coefficient estimates upward and SEs downward if they are regarded as random. These biases become more severe as the difference between the school‐level intraclass correlation (ICC) and the class‐level ICC increases. School‐accountability systems should be designed so that classroom (or teacher) level variation can be taken into consideration when quantifying the precision of school rankings, and statistical models for school mean score reliability should incorporate this information.  相似文献   

PASS理论--一种新的智力认知过程观   总被引:1,自引:3,他引:1  
智力的理论与测量研究是心理发展与教育中一项经久不衰的议题.认知心理学的蓬勃发展为智力研究提供了新的视角."计划-注意-同时性加工-继时性加工"PASS模型以信息加工心理学和Luria关于大脑机能分区的神经心理学观点为基石,强调智力由多种相互独立的认知过程构成,而不是一般因素的观点.以PASS为基础编制的实践工具--认知评估系统CAS和PASS补救方案PREP克服了传统智力测量工具的不足,为全面了解儿童的认知过程提供了有效的工具.文章将就此加以介绍和讨论.  相似文献   

College and university administrators, as well as faculty members, are more likely to take responsibility for student learning if they believe that the assessment data represent their students and suggest specific actions for improvement. This study examined whether it is feasible to develop scalelets (i.e., focused measures, usually consisting of four or five items) that provide dependable metrics for assessing student learning at the college or department level. A generalizability analysis of 12 scalelets from the National Survey of Student Engagement (NSSE) indicated that the scalelets provided dependable measures of educational effectiveness with 25–50 respondents. *SAIR 2004 Best Paper. Presented at the annual meeting of the Association for Institutional Research, Sandiego, CA, March 2005.  相似文献   

自上世纪80年代隐喻被引入认知领域后,认知心理学对隐喻作了大量的研究。本文从三个方面对隐喻的认知属性作了分析:隐喻的本质属性是一种思维形式,是认知心理学的研究对象之一;隐喻的三种理论阐述了隐喻反映客观事物的认知方式;隐喻作用于感知觉、思维、想象,影响它们反映客观世界的形式。  相似文献   

认知理论认为学习的过程就是通过学习与外部环境相互作用实现同化、顺应和平衡的过程。随着认知科学的发展,认知理论的研究越来越多地被教育领域所采用和重视,人们开始运用其理论来指导语言教学。教师在英语教学中,不仅要介绍语言知识并进行“五会”技能训练,更应该把这种学习与训练放到文化教学的大背景中进行,采用适当的方法与手段加大目的语文化的输入途径,把语言教学与文化教学有机结合起来,增加文化导入内容,挖掘其文化的真正含义,让学生从生活和课堂中感悟异域文化氛围,在其业已形成的中华文化中植入英语语言文化,从而使学生生成双语文化能力。  相似文献   

在参考大量文献资料及教育游戏软件和网站的基础上,以认知心理学为依据,对教育游戏在小学生认知学习作用方面进行深入探讨.论证了教育游戏在小学生知觉学习、认知结构、学习动机和信息加工四方面起促进作用,并提出教育游戏应用于课程学习的建议.  相似文献   

认知结构理论研究述评   总被引:2,自引:0,他引:2  
认知结构理论以认知结构为研究核心。纵向分析表明,不同的心理学家对认知结构进行了多角度、多层次的认识,但是都强调了认知结构建构的性质,认知结构与学习的互动关系,突出了学生中心的思想。  相似文献   

小组讨论形式的口语考试既可以提高考试的效率,又能考到总结谈话等面试考试所考不到的谈话管理能力,所以被认为是可以应用在一般教学环境中的有效的口语考试方式。文章利用概化理论对小组讨论形式口语考试的总体信度进行了实证考察,考察结果表明小组讨论形式口语考试有可能被接受的信度。同时,为了最大限度地节省考试的时间和人力等资源,文章研究通过概化理论的D研究在保证考试信度的基础上科学地削减了分项评价项目的个数。  相似文献   

The adaptation of experimental cognitive tasks into measures that can be used to quantify neurocognitive outcomes in translational studies and clinical trials has become a key component of the strategy to address psychiatric and neurological disorders. Unfortunately, while most experimental cognitive tests have strong theoretical bases, they can have poor psychometric properties, leaving them vulnerable to measurement challenges that undermine their use in applied settings. Item response theory–based computerized adaptive testing has been proposed as a solution but has been limited in experimental and translational research due to its large sample requirements. We present a generalized latent variable model that, when combined with strong parametric assumptions based on mathematical cognitive models, permits the use of adaptive testing without large samples or the need to precalibrate item parameters. The approach is demonstrated using data from a common measure of working memory—the N-back task—collected across a diverse sample of participants. After evaluating dimensionality and model fit, we conducted a simulation study to compare adaptive versus nonadaptive testing. Computerized adaptive testing either made the task 36% more efficient or score estimates 23% more precise, when compared to nonadaptive testing. This proof-of-concept study demonstrates that latent variable modeling and adaptive testing can be used in experimental cognitive testing even with relatively small samples. Adaptive testing has the potential to improve the impact and replicability of findings from translational studies and clinical trials that use experimental cognitive tasks as outcome measures.  相似文献   

习语作为特类的表达形式承载着一定的社会文化属性,且具有认知意义。依据现代认知语境理论,文章着重论述了粘附于习语形式中的文化镜象问题;同时,基于文化是承载习语生成、发展的重要认知基础,指出了在外语教学中应该导入文化认知教学内容。  相似文献   

Game-based assessment (GBA), a specific application of games for learning, has been recognized as an alternative form of assessment. While there is a substantive body of literature that supports the educational benefits of GBA, limited work investigates the validity and generalizability of such systems. In this paper, we describe applications of learning analytics methods to provide evidence for psychometric qualities of a digital GBA called Shadowspect, particularly to what extent Shadowspect is a robust assessment tool for middle school students' spatial reasoning skills. Our findings indicate that Shadowspect is a valid assessment for spatial reasoning skills, and it has comparable precision for both male and female students. In addition, students' enjoyment of the game is positively related to their overall competency as measured by the game regardless of the level of their existing spatial reasoning skills.

Practitioner notes

What is already known about this topic:
  • Digital games can be a powerful context to support and assess student learning.
  • Games as assessments need to meet certain psychometric qualities such as validity and generalizability.
  • Learning analytics provide useful ways to establish assessment models for educational games, as well as to investigate their psychometric qualities.
What this paper adds:
  • How a digital game can be coupled with learning analytics practices to assess spatial reasoning skills.
  • How to evaluate psychometric qualities of game-based assessment using learning analytics techniques.
  • Investigation of validity and generalizability of game-based assessment for spatial reasoning skills and the interplay of the game-based assessment with enjoyment.
Implications for practice and/or policy:
  • Game-based assessments that incorporate learning analytics can be used as an alternative to pencil-and-paper tests to measure cognitive skills such as spatial reasoning.
  • More training and assessment of spatial reasoning embedded in games can motivate students who might not be on the STEM tracks, thus broadening participation in STEM.
  • Game-based learning and assessment researchers should consider possible factors that affect how certain populations of students enjoy educational games, so it does not further marginalize specific student populations.

Given increasing interest in evidence-based policy, there is growing attention to how well the results from rigorous program evaluations may inform policy decisions. However, little attention has been paid to documenting the characteristics of schools or districts that participate in rigorous educational evaluations, and how they compare to potential target populations for the interventions that were evaluated. Utilizing a list of the actual districts that participated in 11 large-scale rigorous educational evaluations, we compare those districts to several different target populations of districts that could potentially be affected by policy decisions regarding the interventions under study. We find that school districts that participated in the 11 rigorous educational evaluations differ from the interventions' target populations in several ways, including size, student performance on state assessments, and location (urban/rural). These findings raise questions about whether, as currently implemented, the results from rigorous impact studies in education are likely to generalize to the larger set of school districts—and thus schools and students—of potential interest to policymakers, and how we can improve our study designs to retain strong internal validity while also enhancing external validity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号