首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper,the author discusses reading testing and its validity,and the requirements of reading to test takers as well as the principles which determine the validity of reading testing. By analyzing two GET - 4 model test( reading section), he shows how to make a test of great validity so as to ensure the accuracy and objectiveness of a test.  相似文献   

2.
The attractiveness of computer-based tests (CBTs) is due largely to their capability to expand the ways we conduct testing. A relatively unexplored application, however, is actively using the computer to reduce construct-irrelevant variance while a test is being administered. This investigation introduces the effort-monitoring CBT, in which the computer monitors examinee effort (based on item response time) in a low-stakes test and displays warning messages to those exhibiting rapid-guessing behavior. The results of an experimental study are presented, which showed that an effort-monitoring CBT increased examinee effort and yielded more valid test scores than a conventional CBT. Thus, unlike previous research that has focused on identifying rapid-guessing behavior after it has occurred, the effort-monitoring CBT proactively attempts to suppress rapid-guessing behavior. This innovative testing procedure extends the capabilities of measurement practitioners to manage the psychometric challenges posed by unmotivated examinees.  相似文献   

3.
听力测试是语言测试的一个重要组成部分.在听力测试的设计中,试卷设计者应该将信度和效度结合起来.该文对测试的两个重要因素——信度和效度进行了研究:解释了信度和效度的概念,影响信度和效度的因素,在听力测试设计中经常出现的问题以及如何解决这些问题.  相似文献   

4.
5.
In 2018, 26 states administered a college admissions test to all public school juniors. Nearly half of those states proposed to use those scores as their academic achievement indicators for federal accountability under the Every Student Succeeds Act (ESSA); many others are planning to use those scores for other accountability purposes. Accountability encompasses a number of different uses and subsumes a variety of claims. For states proposing to use summative tests for accountability, a validity argument needs to be developed, which entails delineating each specific use of test scores associated with accountability, identifying appropriate evidence, and offering a rebuttal to counterclaims. The aim of this article is to support states in developing a validity argument for use of college admission test scores for accountability by identifying claims that are applicable across states, along with summarizing existing evidence as it relates to each of these claims. As outlined by The Standards for Educational and Psychological Testing, multiple sources of evidence are used to address each claim. A series of threats to the validity argument, including weaker alignment with content standards and potential influences in narrowing teaching, are reviewed. Finally, the article contrasts validity evidence, primarily from research on the ACT, with regulatory requirements from ESSA. The Standards and guidance addressing the use of a “nationally recognized high school academic assessment” (Elementary and Secondary Education Act (ESEA), Negotiated Rulemaking Committee; Department of Education) are the primary sources for the organization of validity evidence.  相似文献   

6.
This study examined a 13-item instrument measuring approaches to learning (AtL) as a component of school readiness in the context of early childhood socio-emotional development. Few instruments, limited to preschool teacher ratings, measure AtL among kindergarteners with short easy-to-use questionnaires. We investigated psychometric properties of the instrument designed to provide practical measures of AtL behaviours identified in the Arizona Early Learning Standards with teacher (n?=?205) and guardian (n?=?1025) samples. We found a one-factor structure via exploratory factor analysis and confirmatory factor analysis (CFA). The multi-group CFA for combined teacher and guardian models indicated a good fit, which demonstrated the structure validity of the AtL instrument. This finding, combined with evidence of reliability of the instrument, supported the educational utility of the AtL as a new tool for measuring school readiness among kindergarteners in Arizona.  相似文献   

7.
Research related to microcomputer use by preschool-age children is a timely topic given the rapid increase in interest in, and acquisition of, microcomputers by preschool personnel. This review summarizes and critiques the research in this area. Also synthesized is the relatively large body of literature that consists of unsupported claims and speculations about microcomputer effects. Conclusions drawn from this review suggest that (a) there is a general lack of congruence between the actual research findings and the numerous “pronouncements and speculations” about microcomputer effects; (b) many methodological weaknesses limit the utility of research findings; and (c) little is known, empirically, about important process and outcome variables associated with preschoolers' use of microcomputers. Some recommendations for research are presented, based on the findings from the research and other literature reviewed.  相似文献   

8.
Students with disabilities often take tests under different conditions than their peers do. Testing accommodations, which involve changes to test administration that maintain test content, include extending time limits, presenting written text through auditory means, and taking a test in a private room with fewer distractions. For some students with disabilities, accommodations such as these are necessary for fair assessment; without accommodations, invalid interpretations would be made on the basis of these students’ scores. However, when misapplied, accommodations can also diminish fairness, introduce new sources of construct-irrelevant variance, and also lead to invalid interpretation of test scores. This module provides a psychometric framework for thinking about accommodations, and then explicates an accommodations decision-making framework that includes a variety of considerations. Problems with current accommodations practices are discussed, along with potential solutions and future directions. The module is accompanied by exercises allowing participants to apply their understanding.  相似文献   

9.
Based on the validity theory of foreign language testing, this article analyzes a test paper of College English Test-Band Three. It sets focus on the discussion of listening comprehension, vocabulary and structure testing, reading comprehension and translation testing. In the end the author gives some personal opinions and suggestions on College English Test-Band Three.  相似文献   

10.
A misconception exists that validity may refer only to the interpretation of test scores and not to the uses of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test scores remains essential today. However, test scores are not interpreted and then ignored; rather, their interpretations lead to actions. Thus, a modern definition of validity needs to describe the validation of test score interpretations as a necessary, but insufficient, step en route to validating the uses of test scores for their intended purposes. To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests. The current definition of validity stipulated in the 2014 version of the Standards for Educational and Psychological Testing properly describes validity in terms of both interpretations and uses, and provides a sufficient starting point for validation.  相似文献   

11.
任务型语言测试观为大学英语测试(CET)提供了新的理论基础。在Bachman与Palmer的测试任务五特征基础上,创设了新题型设计基本模式,并依据该模式设计新题型对CET4测试进行了初步探索。信度和效度实证检验的结果表明,新题型组的均值虽因主观性试题量的增加而略低于旧题型组,但两者间并无显著性差异。而且新题型还呈现出同期效度值比旧题型高的态势:同时其信度系数也较高且具有显著的统计学意义。  相似文献   

12.
Although teacher collaboration is a school improvement imperative, it persists as an under-empiricized construct that has proven difficult to establish and assess with certainty. In this article, the authors present a validation study of the Teacher Collaboration Assessment Survey (TCAS). The TCAS operationalizes and measures 4 key domains of teacher collaboration: dialogue, decision making, action, and evaluation, and has been used to examine the quality of teacher teaming in district-wide comprehensive school reform efforts in the Northeastern and Mid-Atlantic regions of the United States. Five sources of validity evidence recommended by Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999 American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. [Google Scholar]) are explicated, which establish a strong argument in support of the instruments' validity. The authors discuss how educational leaders and researchers can use the TCAS for leveraging teacher collaboration for instructional innovation and student achievement, and to systematically examine teacher teaming and its relationship to other educational outcomes.  相似文献   

13.
大学生职业声望评价研究   总被引:3,自引:0,他引:3  
运用自编的大学生职业声望问卷对河南省近千名大学生进行问卷调查,结果发现,大学生最为欣赏的十种职业依次为:白领人员,企业管理人员、军人、公公司经理、企业策划人员、厂长、信息分析人员、大学教师、工程师和律师。不同专业、性别、生源、年级间职业声望评价差异较小,但在具体职业上存在差异。表现出职业声望评价共性与个性的统一。  相似文献   

14.
The psychometric literature is replete with comprehensive discussions of test validity, test validation, and the characteristics of quality assessment programs. The most authoritative source for guidance regarding sound test development and evaluation practices is the Standards for Educational and Psychological Testing. However, the Standards are not legally binding. In this article, we review the way in which validity is conceptualized in the Standards and compare this conceptualization with validity evidence presented in specific court cases involving legal challenges to tests. Our review indicates that, in general, there is strong congruence between the Standards and how validity is viewed in the courts, and that testing agencies that conform to these guidelines are likely to withstand legal scrutiny. However, the courts have taken a more practical, less theoretical view on validity and tend to emphasize evidence based on test content and testing consequences.  相似文献   

15.
This article presents findings from two projects designed to improve evaluations of technical quality of alternate assessments for students with the most significant cognitive disabilities. We argue that assessment technical documents should allow for the evaluation of the construct validity of the alternate assessments following the traditions of Cronbach (1971) , Messick (1989, 1995) , Linn, Baker, and Dunbar (1991) , and Shepard (1993) . The projects used the work of Knowing What Students Know ( Pellegrino, Chudowsky, & Glaser, 2001 ) to structure and focus the collection and evaluation of assessment information. The heuristic of the assessment triangle ( Pellegrino et al., 2001 ) was particularly useful in emphasizing that the validity evaluation needs to consider the logical connections among the characteristics of the students tested and how they develop domain proficiency (the cognition vertex), the nature of the assessment (the observation vertex), and the ways in which the assessment results are interpreted (the interpretation vertex). This project has shown that in addition to designing more valid assessments, the growing body of knowledge about the psychology of achievement testing can be useful for structuring evaluations of technical quality.  相似文献   

16.
A framework for evaluation and use of automated scoring of constructed‐response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are discussed within the framework. The fit between the scoring capability and the assessment purpose, the agreement between human and automated scores, the consideration of associations with independent measures, the generalizability of automated scores as implemented in operational practice across different tasks and test forms, and the impact and consequences for the population and subgroups are proffered as integral evidence supporting use of automated scoring. Specific evaluation guidelines are provided for using automated scoring to complement human scoring for tests used for high‐stakes purposes. These guidelines are intended to be generalizable to new automated scoring systems and as existing systems change over time.  相似文献   

17.
语言测试是语言教学的重要环节,是测量学生语言习得成果的重要手段。衡量语言测试的关键是看它的信度和效度,好的测试是信度和效度的合理平衡的结果。拟就大学英语校内测试在信度和效度上的不足谈自己的看法,并提出相应的改进方法。  相似文献   

18.
完形填空试题由于在命题、实施、评卷、结果分析等方面具有客观、便利等优点,因而被广泛应用于外语教学和测试中。但是目前充斥市场的绝大多数完形填空试题效度不高,主要原因就是试题的考点层次不高,效度偏低。根据李筱菊提出的完形填空考点层次理论设计一道完形填空试题,并选择某高校的学生进行试测,重点分析了答题正确率和失分原因,从实证的角度得出通过提高考点层次来提升完形填空试题考点效度的方法。应着重培养学生在高层次考点上的能力,从而提高英语学习者的综合英语水平。  相似文献   

19.
The Trends in International Mathematics and Science Study (TIMSS) is a comparative assessment of the achievement of students in many countries. In the present study, a rigorous independent evaluation was conducted of a representative sample of TIMSS science test items because item quality influences the validity of the scores used to inform educational policy in those countries. The items had been administered internationally to 16,009 students in their eighth year of formal schooling. The evaluation had three components. First, the Rasch model, which emphasizes high quality items, was used to evaluate the items psychometrically. Second, readability and vocabulary analyses were used to evaluate the wording of the items to ensure they were comprehensible to the students. And third, item development guidelines were used by a focus group of science teachers to evaluate the items in light of the TIMSS assessment framework, which specified the format, content, and cognitive domains of the items. The evaluation components indicated that the majority of the items were of high quality, thereby contributing to the validity of TIMSS scores. These items had good psychometric characteristics, readability, vocabulary, and compliance with the assessment framework. Overall, the items tended to be difficult: constructed response items assessing reasoning or application were the most difficult, and multiple choice items assessing knowledge or application were less difficult. The teachers revised some of the sampled items to improve their clarity of content, conciseness of wording, and fit with format specifications. For TIMSS, the findings imply that some of the non‐sampled items may need revision, too. For researchers and teachers, the findings imply that the TIMSS science items and the Rasch model are valuable resources for assessing the achievement of students. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 1321–1344, 2012  相似文献   

20.
语言测试不仅可以考察学生的语言能力,而且可以反映教学工作的长处和不足.基于此,对民办理工科学生的英语四级成绩进行分析,描述了这部分学生的外语学习不太令人满意的状况,从而得出结论和启示:民办学生、尤其是理工科学生的外语教学应该作出调整和变化.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号