The authors analyzed self-reported SAT scores and actual SAT scores for five different samples of college students (N = 650). Students overestimated their actual SAT scores by an average of 25 points (SD = 81, d = 0.31), with 10% under-reporting, 51% reporting accurately, and 39% over-reporting, indicating a systematic bias towards over-reporting. The amount of over-reporting was greater for lower-scoring than higher-scoring students, was greater for upper division than lower division students, and was equivalent for men and women. There was a strong correlation between self-reported and actual SAT scores (r = 0.82), indicating high validity of students’ memories of their scores. Results replicate previous findings (Kuncel, Credé, & Thomas, 2005) and are consistent with a motivated distortion hypothesis. Caution is suggested in using self-reported SAT scores in psychological research.
Richard E. MayerEmail:

Despite the similarities that researchers note between the cognitive processes and knowledge involved in reading and writing, there are students who are much stronger readers than writers and those who are much stronger writers than readers. The addition of the writing section to the SAT provides an opportunity to examine whether certain groups of students are more likely to exhibit stronger performance in reading versus writing and the academic consequences of this discrepant performance. Results of this study, based on hierarchical linear models of student performance, showed that even after controlling for relevant student characteristics and prior academic performance, an SAT critical reading–writing discrepancy had a small effect on 1st-year grade point average as well as English course grades in college. Specifically, students who had relatively higher writing scores as compared to their critical reading scores earned higher grades in their 1st year of college as well as in their 1st-year English course(s).  相似文献   

作为申请美国大学的重要参考,ACT和SAT考试都包含有英语测试的部分。本文通过对ACT和SAT考试中涉及语言测试部分的比较研究,以期对我国开展汉语能力测试提供借鉴和参考。  相似文献   

A College Board-sponsored survey of a nationally representative sample of 1995–96 SAT takers yielded a data base for more than 4, 000 examinees, about 500 of whom had attended formal coaching programs outside their schools. Several alternative analytical methods were used to estimate the effects of coaching on SAT I: Reasoning Test scores. The various analyses produced slightly different estimates. All of the estimates, however, suggested that the effects of coaching are far less than is claimed by major commercial test preparation companies. The revised SAT does not appear to be any more coachable than its predecessor.  相似文献   

为适应现代社会对人才的需要,美国教育考试服务处对美国大学入学考试SAT进行了改革。改革后的SAT数学考试更注重与学生课堂学习的联系,更强调对数学概念的理解,进一步加强了对逻辑推理能力和计算能力的考查;对数学运算的准确和熟练程度都提出了更高的要求;在丰富的应用场景中,考查数学在职业、科学和在社会研究中的应用;试卷的长度加长,综合程度提高。这些都为我国的高考改革提供了有益的启示和借鉴。  相似文献   

This article presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a 4-parameter beta model. The conditional distribution of scores on an alternate form, given the true score, is estimated from a binomial distribution based on the estimated effective test length. Agreement between classifications on alternate forms is estimated by assuming conditional independence, given the true score. Evaluation of the method showed estimates to be within 1 percentage point of the actual values in most cases. Estimates of decision accuracy and decision consistency statistics were only slightly affected by changes in specified minimum and maximum possible scores.  相似文献   

Comparing SAT scores among states using regression analysis leads to biased results because states differ in the proportion of students taking the exam. When the proportion of students taking the exam is included in the regression equation, the results can be biased because of misspecifieation bias. A method intended to correct for selection bias is presented, and empirical results suggest that sample selection bias is present in SAT score regressions. Regression equations and state rankings are compared between the selection-corrected equation and equations for which the selection problem is not addressed. The proposed method is one of many available as possible solutions to the selection problem. Alternative methods may produce different results  相似文献   

Grades and Test Scores: Accounting for Observed Differences   总被引:1,自引:0,他引:1  
Why do grades and test scores often differ? A framework of possible differences is proposed in this article. An approximation of the framework was tested with data on 8,454 high school seniors from the National Education Longitudinal Study. Individual and group differences in grade versus test performance were substantially reduced by focusing the two measures on similar academic subjects, correcting for grading variations and unreliability, and adding teacher ratings and other information about students. Concurrent prediction of high school average was thus increased from 0.62 to 0.90; differential prediction in eight subgroups was reduced to 0.02 letter‐grades. Grading variation was a major source of discrepancy between grades and test scores. Other major sources were teacher ratings and Scholastic Engagement, a promising organizing principle for understanding student achievement. Engagement was defined by three types of observable behavior: employing school skills, demonstrating initiative, and avoiding competing activities. While groups varied in average achievement, group performance was generally similar on grades and tests. Major factors in achievement were similarly constituted and similarly related from group to group. Differences between grades and tests give these measures complementary strengths in high‐stakes assessment. If artifactual differences between the two measures are not corrected, common statistical estimates of validity and fairness are unduly conservative.  相似文献   

The impact of allowing more time for each question on the SAT I: Reasoning Test scores was estimated by embedding sections with a reduced number of questions into the standard 30-minute equating section of two national test administrations. Thus, for example, questions were deleted from a verbal section that contained 35 questions to produce forms that contained 27 or 23 questions. Scores on the 23-question section could then be compared to scores on the same 23 questions when they were embedded in a section that contained 27 or 35 questions. Similarly, questions were deleted from a 25-question math section to form sections of 20 and 17 questions. Allowing more time per question had a minimal impact on verbal scores, producing gains of less than 10 points on the 200–800 SAT scale. Gains for the math score were less than 30 points. High-scoring students tended to benefit more than lower-scoring students, with extra time creating no increase in scores for students with SAT scores of 400 or lower. Ethnic/racial and gender differences were neither increased nor reduced with extra time.  相似文献   

SAT(Scholastic Assessment Test)作为美国目前广为接受的大学入学考试,其公平性一直遭受质疑,尤其是在性别、种族等敏感领域。基于美国某高中学生的SAT数据,运用最小二乘估计法,建立了关于SAT考试成绩的单方程线性回归模型。回归结果显示在保持模型中其他因素不变的情况下,SAT考试的确存在性别和种族歧视,且性别对成绩的影响要大于种族对成绩的影响。最后结合2016年SAT考试的公平性改革,探究SAT的未来发展方向及对我国新高考改革的借鉴。  相似文献   

Randomly selected fifth, seventh, ninth, and eleventh graders (sixty from each grade) were givenanability test. The score and the time taken were used to test the hypotheses of no negative linear relationship and no curvilinear relationship between test score and test time. Although no significant linear relationships were found, significant curvilinear regressions of time on score were found in grades seven and nine. The strength of these significant relationships were relatively low in both grades.  相似文献   

SAT考试:高考制度改革可资借鉴的一面铜镜   总被引:3,自引:2,他引:3  
自1999年开始,我国高考制度改革的重心实现了向考试科目设置以及高考形式和内容的改革方向的转移,江苏、浙江、吉林和山西四省分别推出了“3 综合”的考试新模式,广东省也积极进行了“3 X”考试模式的新探索,并将逐步推广到全国其他省市自治区。新一轮普通高校的招生考试制度改革,普遍摒弃了以往以单纯的知识测试作为录取学生的唯一依据的传统考试模式,突出和强调了对学生综合素质的考察,这反映了高等教育“大众化”发展趋势的要求,也体现了人们对于实施素质教育思想的高度认同。一、 素质教育的实施,要求我们必须改变传统的教育…  相似文献   

高考虽然是选拔性考试,但需要应用标准参照考试的理论,深入细致地分析考试数据和考生答题情况,这样既可知道考生在群体中的地位,更可以知道考试分数的意义以及考生能力发展水平和知识掌握程度,对考生做出科学合理的评价。进而使招生的高校更加具体、深入地了解考生的学业水平和学科特长,挑选满足自身招生要求、适合本专业培养的考生,也将会更有利于人才的选拔,也更有利于人才的培养。  相似文献   

通过对函数S-粗集和动态规划算法的研究,提出了相似度和可信度概念,给出了非标准化试题实现评分的方案和步骤,其中关键步骤是迁移处理和计算最长公共子序列长度。主要阐述了基于函数S-粗集的迁移处理,并分析了计算最长公共子序列长度解的结构和计算方法,最后分别给出了迁移函数和计算最长公共子序列长度函数的源程序。  相似文献   

Prior research has shown that there is substantial variability in the degree to which the SAT and high school grade point average (HSGPA) predict 1st-year college performance at different institutions. This article demonstrates the usefulness of multilevel modeling as a tool to uncover institutional characteristics that are associated with this variability. The results revealed that the predictive validity of HSGPA decreased as mean total SAT (i.e., sum of the three SAT sections) score at an institution increased and as the proportion of White freshmen increased. The predictive validity of the three SAT sections (critical reading, mathematics, and writing) varied differently as a function of different institution-level variables. These results suggest that the estimates of validity obtained and aggregated from multiple institutions may not accurately reflect the unique contextual factors that influence the predictive validity of HSGPA and SAT scores at a particular institution.  相似文献   

Are variations in test-preparation practices from school to school undermining the meaningfulness of achievement test results? Is there pressure to raise achievement test scores by the use of educationally unsound practices? What uses of achievement test scores are most common? Do teachers and administrators have reasonably accurate views of test score uses?  相似文献   

Current thinking on validity suggests that educational institutions and individuals should evaluate their uses of test scores in the context of their fundamental goals. Regression coefficients and other traditional criterion-related validity statistics provide relevant information, but often do not, by themselves, address the fundamental reasons for using test scores. Formal decision theory models provide a logically rigorous way to do this, but they are difficult to implement in practice. This article considers a simplification of formal decision theory models, in which one estimates the proportion of examinees for whom positive outcomes result from a use of test scores. For uses involving selection, the proportion of examinees with positive outcomes can be calculated by applying traditional regression coefficients to the marginal distribution of scores in the unselected population. The incremental usefulness of using a particular variable can be judged by comparing its proportion to that associated with no selection and to that associated with using another variable, either alone or jointly. Examples, related to college admission and retention, are given to illustrate these ideas.  相似文献   

