首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multiple‐choice, short‐answer, and extended‐response item formats were used in the Third International Mathematics and Science Study to assess student achievement in mathematics and science at Grades 7 and 8 in more than 40 countries around the world. Data pertaining to science indicate that the standings of some countries relative to others change when performance is measured via the different item formats. The question addressed in the present article is the following: Can the instability of ranks in this case be attributed principally to item format, or are other important factors at work? It is argued that the findings provide further evidence that comparing student achievement across countries is a very complex undertaking indeed.  相似文献   

2.
Science education needs valid, authentic, and efficient assessments. Many typical science assessments primarily measure recall of isolated information. This paper reports on the validation of assessments that measure knowledge integration ability among middle school and high school students. The assessments were administered to 18,729 students in five states. Rasch analyses of the assessments demonstrated satisfactory item fit, item difficulty, test reliability, and person reliability. The study showed that, when appropriately designed, knowledge integration assessments can be balanced between validity and reliability, authenticity and generalizability, and instructional sensitivity and technical quality. Results also showed that, when paired with multiple‐choice items and scored with an effective scoring rubric, constructed‐response items can achieve high reliabilities. Analyses showed that English language learner status and computer use significantly impacted students' science knowledge integration abilities. Students who took the assessment online, which matched the format of content delivery, performed significantly better than students who took the paper‐and‐pencil version. Implications and future directions of research are noted, including refining curriculum materials to meet the needs of diverse students and expanding the range of topics measured by knowledge integration assessments. © 2011 Wiley Periodicals, Inc. J Res Sci Teach 48: 1079–1107, 2011  相似文献   

3.
学业成就评价是当前新课程改革研究的热点之一。如何科学地设计和开发试题,对深化新课改、进行基础教育质量监控有着重要意义。PISA是一项权威的国际学生评价项目,具有较高的可比性、可信性和有效性。PISA2006科学评估框架包含情境、知识、态度和能力等相互联系的四个方面,其试题设计和开发技术采用了"双位编码"评分设计,增加了态度评估试题,保证了试题与标准的匹配。  相似文献   

4.
数学认知:脑与认知科学的研究成果及其教育启示   总被引:6,自引:0,他引:6  
数学认知是人类最重要的高级认知功能之一。脑与认知科学的最新研究结果表明,数学认知是一个多成分、多系统的复杂认知系统,既有种系进化的基础,也与个体发展与学习有关。数学认知依赖于一个大范围的皮层支持网络,包括顶叶、额叶与颞叶的部分区域。尤其是顶叶皮层,在数学认知障碍与数学学习过程中均有重要作用。这充分表明,语言与视觉空间功能对于数学认知具有重要意义。上述研究成果对于科学地认识学生的数学学习、基础数学教育的方式方法、数学教育评价等问题均有重要启示。  相似文献   

5.
Recent test‐based accountability policy in the U.S. has involved annually assessing all students in core subjects and holding schools accountable for adequate progress of all students by implementing sanctions when adequate progress is not met. Despite its potential benefits, basing educational policy on assessments developed for a student population of White, middle‐ and upper‐class, and native speakers of English opens the door for numerous pitfalls when the assessments are applied to minority populations including students of color, low SES, and learning English as a new language. There exists a paradox; while minority students are a primary intended beneficiary of the test‐based accountability policy, the assessments used in the policy have been shown to have many shortcomings when applied to these students. This article weighs the benefits and pitfalls that test‐based accountability brings for minority students. Resolutions to the pitfalls are discussed, and areas for future research are recommended. © 2009 Wiley Periodicals, Inc. J Res Sci Teach 47: 6–24, 2010  相似文献   

6.
学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。  相似文献   

7.
This study examined whether practice testing with short-answer (SA) items benefits learning over time compared to practice testing with multiple-choice (MC) items, and rereading the material. More specifically, the aim was to test the hypotheses of retrieval effort and transfer appropriate processing by comparing retention tests with respect to practice testing format. To adequately compare SA and MC items, the MC items were corrected for random guessing. With a within-group design, 54 students (mean age = 16 years) first read a short text, and took four practice tests containing all three formats (SA, MC and statements to read) with feedback provided after each part. The results showed that both MC and SA formats improved short- and long-term memory compared to rereading. More importantly, practice testing with SA items is more beneficial for learning and long-term retention, providing support for retrieval effort hypothesis. Using corrections for guessing and educational implications are discussed.  相似文献   

8.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.  相似文献   

9.
This paper illustrates that the psychometric properties of scores and scales that are used with mixed‐format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is on mixed‐format tests in situations for which raw scores are integer‐weighted sums of item scores. Four associated real‐data examples include (a) effects of weights associated with each item type on reliability, (b) comparison of psychometric properties of different scale scores, (c) evaluation of the equity property of equating, and (d) comparison of the use of unidimensional and multidimensional procedures for evaluating psychometric properties. Throughout the paper, and especially in the conclusion section, the examples are related to issues associated with test interpretation and test use.  相似文献   

10.
中国政府把扩大内需作为经济发展的长期战略方针,积极倡导发挥消费的基础作用。食品是人们的必需消费品,其消费主要是通过各种零售业态。随着全球经济一体化和科技日益进步,食品零售业态呈现多元化发展态势。本文概括了世界食品零售业态状况,并进一步分析了欧洲、亚洲、美洲、非洲食品零售业各主要国家多元业态现状和发展形势,为中国食品零售业市场发展提供参考。  相似文献   

11.
Large‐scale assessments such as the Programme for International Student Assessment (PISA) have field trials where new survey features are tested for utility in the main survey. Because of resource constraints, there is a trade‐off between how much of the sample can be used to test new survey features and how much can be used for the initial item response theory (IRT) scaling. Utilizing real assessment data of the PISA 2015 Science assessment, this article demonstrates that using fixed item parameter calibration (FIPC) in the field trial yields stable item parameter estimates in the initial IRT scaling for samples as small as n = 250 per country. Moreover, the results indicate that for the recovery of the county‐specific latent trait distributions, the estimates of the trend items (i.e., the information introduced into the calibration) are crucial. Thus, concerning the country‐level sample size of n = 1,950 currently used in the PISA field trial, FIPC is useful for increasing the number of survey features that can be examined during the field trial without the need to increase the total sample size. This enables international large‐scale assessments such as PISA to keep up with state‐of‐the‐art developments regarding assessment frameworks, psychometric models, and delivery platform capabilities.  相似文献   

12.
This research examines the effect of two testing strategies on academic achievement and summative evaluations in an introductory statistics course. In 2001, 63 students underwent an hourly midterm format; and in 2002, 68 students underwent a bi-weekly exam format. Other than the exam format, the class lectures and labs were identical in terms of content, structure, pace, and the cumulative final exam. Findings from the regression analyses show that students in the bi-weekly format performed better than the students in the hourly midterm format. On average, students who took the bi-weekly exams performed about 10 percentage points higher (one letter grade) on the exams during the semester and about 15 percentage points higher on the cumulative final exam compared to their peers who took hourly midterms. The benefits of the bi-weekly format were significantly greater among female students than male students. Finally, students in the bi-weekly format were less likely to drop the class and evaluated the class far more favorably. Carrie B. Myers is an Assistant Professor of Adult and Higher Education at Montana State University. She received her Ph.D. in Higher Education Administration from Washington State University. Her research focuses on student and faculty development and assessment and evaluation. Scott M. Myers is an Associate Professor of Sociology at Montana State University. His areas of research are family demography and education. He received a Ph.D. in Sociology and a Ph.D. in Demography from the Pennsylvania State University.  相似文献   

13.
14.
15.
Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional behavior disorders) and students without disabilities. Multinomial logistic regression was employed to compare response characteristic curves (RCCs) of individual test items. Although no evidence for serious test bias was found for the state assessment examined in this study, the results indicated that students in different disability categories showed different patterns of DIF, DDF, and DOF, and that the use of RCCs helps clarify the implications of DIF and DDF.  相似文献   

16.
The purpose of this paper is to define and evaluate the categories of cognitive models underlying at least three types of educational tests. We argue that while all educational tests may be based—explicitly or implicitly—on a cognitive model, the categories of cognitive models underlying tests often range in their development and in the psychological evidence gathered to support their value. For researchers and practitioners, awareness of different cognitive models may facilitate the evaluation of educational measures for the purpose of generating diagnostic inferences, especially about examinees' thinking processes, including misconceptions, strengths, and/or abilities. We think a discussion of the types of cognitive models underlying educational measures is useful not only for taxonomic ends, but also for becoming increasingly aware of evidentiary claims in educational assessment and for promoting the explicit identification of cognitive models in test development. We begin our discussion by defining the term cognitive model in educational measurement. Next, we review and evaluate three categories of cognitive models that have been identified for educational testing purposes using examples from the literature. Finally, we highlight the practical implications of "blending" models for the purpose of improving educational measures .  相似文献   

17.
Performance on figure copying tasks is empirically linked to the school readiness, learning, cognition, and neuropsychological functioning. These nonverbal tasks are frequently used to evaluate children from diverse backgrounds to minimize bias due to factors such as language, ethnicity, culture, or socioeconomic status on test performance. The current study examined the possible Differential Item Functioning across African American and Caucasian groups, ages 4 to 7 years, in Bender Motor Gestalt Test, Second Edition (BG‐II) visual‐motor scores. Results indicated that in general the BG‐II can be considered invariant across these ethnic groups in this age range.  相似文献   

18.
摘要:随着我国新课程改革的不断加深。情境认知理论的实践性成为教师在教学中不断探索的课题。根据情境认知理论的内涵,写作课教师在教学情境的创设中可遵循引导性、互动性、有效性、多元性原则,并结合新课程改革的要求,针对不同年级学生的具体情况,根据教学目标等内容,灵活设计教学情境,选择适宜的教学模式。  相似文献   

19.
为比较结构方程模型和 IRT等级反应模型在人格量表项目筛选上的作用,以《中国大学生人格量表》的7229个实际测量数据为基础,针对因素二“爽直”分别以Lisrel8.70和Multilog7.03进行结构方程模型和等级反应模型的参数估计与拟合,比较两种方法的项目筛选结果.二者统计结果均认为项目5、6、7、8拟合度不佳,在结构方程模型上表现为因子负荷较低,整体拟合指数不理想;在等级反应模型上表现为区分度参数和位置参数不理想,相关项目的特征曲线和信息曲线形态较差.但结构方程模型倾向于项目6、8更差,而等级反应模型则倾向于项目5、6更差.结构方程模型和 IRT等级反应模型对人格量表项目的统计推断结果从总体上讲是一致的,但在个别项目上略有差异.二者各有优势,可以结合使用.  相似文献   

20.
信访工作是政府行政管理部门处理人民群众合理诉求、解决人民群众遇到的各类矛盾和问题的渠道。清醒地认识和准确地把握新形势下信访稳定工作的发展变化,是做好新形势下信访稳定工作的前提和基础。做好信访工作,必须充分把握当前信访稳定工作的理论与实践认知定位,不断增强新形势下做好信访稳定工作的责任感。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号