首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 468 毫秒
A pool of items from operational tests of mathematical reasoning was constructed to investigate the feasibility of using automated test assembly (ATA) methods to simultaneously moderate possibly irrelevant differences between the performance of women and men, and African American and White test takers. None of the artificial tests exhibited substantial impact moderation, although the estimated mean scaled score differences for the relevant population indicated a modest move in the intended direction: the difference between scaled score means was reduced by about 20% for women and men and about 9% for African American and White test takers. Although many issues in the implementation of this methodology remain to be solved, the consideration of impact in ATA, along with the maintenance of the detailed test plan, appears to be a potential method of moderating possibly irrelevant mean test score differences.  相似文献   

In May 1990 new groups of examinees participated in the Swedish Scholastic Aptitude Test (SweSA T). Generally these new groups were younger and had higher education than the examinees at earlier test administrations. The purpose of the study reported was to examine whether the gender differences in test results had changed with the changed composition of examinees. The groups of men and women were successively matched according to age and education and comparisons were made of gender differences in test results between different age and education groups. The results, however, showed that even though age as well as education had influence on the test results, no real difference was found between younger and older examinees regarding gender differences in the test results.  相似文献   

Test scores matter these days. Test‐takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees’ information or usability needs, but this is clearly changing for the better due to recent, much‐needed additions to the psychometric literature as well as improved efforts in reporting practices. This paper provides an overview of score reports from a development perspective, focusing on current practices and emerging efforts in content of reports as well as the process by which reports are designed, evaluated, and ultimately used to communicate with the public.  相似文献   

The Slosson Intelligence Test (SIT) for Children and Adults protocols for 683 gifted students ranging in age from 6 to 12 years were scored using both 1961 and 1981 norms. The average 1981 norm score was 5.17 points lower than the 1961 norm score. The differences increase with the age of the child. Implications for using the SIT for selecting gifted children are discussed.  相似文献   

Views on testing—its purpose and uses and how its data are analyzed—are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a contestant in some detail. Test takers who are contestants in high‐stakes settings want reliable outcomes obtained via acceptable scoring of tests administered under clear rules. In addition, it is essential to empirically verify interpretations attached to scores. At the very least, item and test scores should exhibit certain invariance properties. I note that the “do no harm” dictum borrowed from the field of medicine is particularly relevant to the perspective of test takers as contestants.  相似文献   

Five methods for equating in a random groups design were investigated in a series of resampling studies with samples of 400, 200, 100, and 50 test takers. Six operational test forms, each taken by 9,000 or more test takers, were used as item pools to construct pairs of forms to be equated. The criterion equating was the direct equipercentile equating in the group of all test takers. Equating accuracy was indicated by the root-mean-squared deviation, over 1,000 replications, of the sample equatings from the criterion equating. The methods investigated were equipercentile equating of smoothed distributions, linear equating, mean equating, symmetric circle-arc equating, and simplified circle-arc equating. The circle-arc methods produced the most accurate results for all sample sizes investigated, particularly in the upper half of the score distribution. The difference in equating accuracy between the two circle-arc methods was negligible.  相似文献   

In the present study, the similarity of the factor structure of the Test Anxiety Scale for Elementary Students (TAS-E) and cultural and gender differences in test anxiety were examined in a sample of 1322 US and Singapore elementary students. The similarity of the factor structure of the TAS-E, a measure of test anxiety, was examined to determine whether the same test score interpretation could be made across culture and gender. Coefficient of congruence and salient variable similarity index values indicated that the pairs of matched factors (Physiological Hyperarousal, Social Concerns, Task Irrelevant Behaviour, Worry and Total Test Anxiety factors) of the TAS-E were similar across culture and gender. Results of a 2?×?2 ANOVA and 2?×?2 MANOVA with follow-up ANOVAs revealed that Singapore males scored higher than US males and US females scored higher than Singapore females on the TAS-E Total Test Anxiety scale and the Physiological Hyperarousal subscale. Singapore males also scored higher than US males on the TAS-E Worry subscale. Implications of the findings are discussed.  相似文献   

A College Board-sponsored survey of a nationally representative sample of 1995–96 SAT takers yielded a data base for more than 4, 000 examinees, about 500 of whom had attended formal coaching programs outside their schools. Several alternative analytical methods were used to estimate the effects of coaching on SAT I: Reasoning Test scores. The various analyses produced slightly different estimates. All of the estimates, however, suggested that the effects of coaching are far less than is claimed by major commercial test preparation companies. The revised SAT does not appear to be any more coachable than its predecessor.  相似文献   

2004年高考(上海卷)地理试卷包含两大部分:选择题和综合分析题。选择题部分共20题,每题2分,计40分。综合分析题部分有八大题,34个小题, 110个得分点。主要从经典的试题分析、考试结果的信度、考试效度的内容和结构方面的证据以及考试对教育教学的影响等几个角度对地理考试进行评价,得出下列结论:地理考试的能力目标是根据课程标准制定的,命题以课程标准为依据,难度略偏易,有一定的区分度;试卷的题量适中;选择题与非选择题比例适中,对学校的教育和教学有较好的导向作用。然而,综合分析题部分图文信息阅读量较大,应答文字表述较少,难以比较系统地考查考生独立的地理思维能力,这对教学的导向是不利的。  相似文献   

The Slosson Intelligence Test, unlike most current measures of intelligence, uses a ratio method of mental age divided by chronological age to obtain an IQ score. Due to this, standard deviations are not stable across age levels and present a problem in diagnosing mental retardation. The Slosson Test Manual provides information whereby an overall test standard deviation of approximately 25 points is obtained. This is reviewed in respect to current criteria for the classification of mental retardation. It is concluded that the Slosson is inappropriate for use in the diagnosis of mental retardation.  相似文献   

对不同类型学校的774名有效被试实施数学学业成就水平测试,并应用IRT参数模型方法进行分析,得出四点判断:(1)测验分数、最优分数呈负偏态分布;(2)测验信息函数负向偏移,大体呈现双峰波形;(3)主观性试题与逻辑斯蒂模型的拟合性较差;(4)不同类型学校学生的数学学业成就水平存在显著性差异。  相似文献   

The present study examined the experiences, preparation, and perceptions of 63 educational interpreters employed in two rural states, using surveys and subsequent in-depth interviews with selected subjects. Only 10 of the 63 interpreters had completed interpreter preparation programs, with 5 of these having no course work related to education. None of the interpreters working in elementary or secondary schools held certification from the Registry of Interpreters for the Deaf or any other certifying body. Of the 63 interpreters, 43 were assessed using the Educational Interpreter Performance Assessment (EIPA), which uses a scale of 0-5. Test takers who score 3.5 or better are considered "coherent." The mean score on the EIPA for the 43 educational interpreters was 2.6. Respondents reported concerns about their limited understanding of American Sign Language (ASL), their ability to interpret from ASL to English, and their salaries, training, and professional status.  相似文献   

Probability selection models for Scholastic Achievement Test (SAT) test takers using truncated normal distributions have been independently discussed by Taube and Linden (1989) and Edwards and Beckworth (1989). Holland and Wainer (1990) provided one formalization of the truncated normal selection model, and rejected it based on patterns in state test-taker variances and test score histograms. Their model, however, is fundamentally different from that considered by Edwards and Beckworth; the former hypothesizes that the probability of taking the test is a function of test score, the latter that it is a function of true score. Here, an evolved form of the Edwards and Beckworth model is outlined, and its relationship with earlier models is discussed. It is shown that the arguments of Holland and Wainer are not sufficient to reject this model.  相似文献   

为了向PETS考生提供更为优质的服务,研究人员根据考生的实际需要,设计了PETS考生笔试成绩报告单。该文介绍了报告单设计所遵循的基本原则和报告单所呈现的主要内容。  相似文献   


High school students completed both multiple-choice and constructed response exams over an 845-word narrative passage on which they either took notes or underlined critical information. A control group merely read the text In addition, half of the learners in each condition were told to expect either a multiple-choice or constructed response test following reading. Overall, note takers showed superior posttest recall, and notetaking without test instructions yielded the best group performance. Notetaking also required significantly more time than the other conditions. Underlining for a multiple-choice test led to better recall than underlining for a constructed response test. Although more multiple-choice than constructed response items were remembered. Test Mode failed to interact with the other factors.  相似文献   

难度不是试题的固有属性,而是考生因素与试题特征之间互动的结果。很多试题分析者倾向于将试题难度偏高的原因仅仅归结于学生未掌握相关知识或技能,而忽视试题本身的特征。通过分析60道难度在0.6以下的高考英语试题,探究其难度来源。结果显示,除考生因素外,难题或偏难题的难度来源也与命题技术有关,比如答案的唯一性与可接受性、考查内容超纲、考点设置与评分标准欠妥等方面的问题。为此,提出考试机构应提高命题水平,加强试题质量监控,确保大规模考试科学选拔人才。  相似文献   

Some applicants for admission to graduate programs present Graduate Record Examinations (GRE) General Test scores that are several years old. Due to different experiences over time, older GRE verbal, quantitative, and analytical scores may no longer accurately reflect the current capabilities of the applicants. To provide evidence regarding the long-term stability of GRE scores, test-retest correlations and average change (net gain) in test performance were analyzed for GRE General Test repeaters classified by time between test administrations in intervals ranging from less than 6 months to 10 years or more. Findings regarding average changes in verbal and quantitative test performance for long-term repeaters (with 5 years or more between tests), generally, and by graduate major area, sex, and ethnicity, appeared to be consistent with a differential growth hypothesis: Long-term repeaters generally, and in all of the subgroups, registered greater average (net) score gain on verbal tests than on quantitative tests and, for subgroups, the amount of gain tended to vary directly with initial means. A rationale is presented for a growth interpretation of the observed average gains in test performance. Implications for graduate school and GRE Program policies regarding the treatment of older test scores are considered.  相似文献   

This study analyzed the factor structure of the Wechsler Intelligence Scale for Children-Revised (WISC-R), the Peabody Picture Vocabulary Test (PPVT), and the Peabody Individual Achievement Test (PIAT) in a psychiatric sample that ranged in age from 6 to 16 years (mean age = 11.1 years; SD = 3.0). The resultant factor structure of this sample was compared with patterns reported on normal and learning-disabled children. The subjects were 329 children under inpatient and outpatient care who had been referred for emotional disturbances. The results were similar to previous factor analytic studies of the WISC-R and PIAT, showing four factors: Verbal Comprehension, Verbal Achievement, Perceptual Organization, and Number Facility. The implications for the interpretation of these tests in a psychiatric sample and the appropriateness of a maximum likelihood technique in analysis of psychometric data are discussed.  相似文献   

A prolonged working life is crucial for sustaining social welfare and fiscal stability for countries facing ageing populations. The group of older adults is not homogeneous; however, differences within the group may affect the propensity to continue working and to participate in continuing education. The aim of this paper is to explore how participation in work and education vary with gender, age, and education level in a sample of older adults. The study was performed in Sweden, a context characterized by high female labour-market-participation rates and a high average retirement age. The participants were 232 members of four of the major senior citizens’ organizations. We found no differences in participation in work and education based on gender. People older than 75 years were found to be as active as people 65–75 years old in education, but the older group worked less. There were positive associations between education level and participation in both work and education. Hence, this study implies that socio-economic inequalities along these dimensions are widened later in life. This highlights the importance of engaging workers with lower education levels in educational efforts throughout life. It also emphasizes the need for true lifelong learning in society.  相似文献   

Performances of fourth and sixth grade children who had been in a program based on Science—A Process Approach were compared with performances of control groups on two conservation-of-volume tasks. The fourth grade children who had had Science—A Process Approach performed at a higher level than the control group on one of the tasks. There were no other significant differences between groups. The volume tasks were analyzed and learning hierarchies devised. A test based on the hierarchies was constructed and administered to all (189) children. An instructional program based on the hierarchies was carried out with approximately half of the children in each school at each grade level. All children were then post-tested on the volume tasks and the tasks of the learning hierarchies. All groups who had deceived instruction had higher mean scores on the Learning Hierarchies Test but no group made a significant improvement on the volume tasks. Performance of the volume tasks was found to be related to age and score on the Learning Hierarchies Test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号