共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Richard E. Mayer Andrew T. Stull Julie Campbell Kevin Almeroth Bruce Bimber Dorothy Chun Allan Knight 《Educational Psychology Review》2007,19(4):443-454
The authors analyzed self-reported SAT scores and actual SAT scores for five different samples of college students (N = 650). Students overestimated their actual SAT scores by an average of 25 points (SD = 81, d = 0.31), with 10% under-reporting, 51% reporting accurately, and 39% over-reporting, indicating a systematic bias towards
over-reporting. The amount of over-reporting was greater for lower-scoring than higher-scoring students, was greater for upper
division than lower division students, and was equivalent for men and women. There was a strong correlation between self-reported
and actual SAT scores (r = 0.82), indicating high validity of students’ memories of their scores. Results replicate previous findings (Kuncel, Credé,
& Thomas, 2005) and are consistent with a motivated distortion hypothesis. Caution is suggested in using self-reported SAT scores in psychological
research.
相似文献
Richard E. MayerEmail: |
3.
Despite the similarities that researchers note between the cognitive processes and knowledge involved in reading and writing, there are students who are much stronger readers than writers and those who are much stronger writers than readers. The addition of the writing section to the SAT provides an opportunity to examine whether certain groups of students are more likely to exhibit stronger performance in reading versus writing and the academic consequences of this discrepant performance. Results of this study, based on hierarchical linear models of student performance, showed that even after controlling for relevant student characteristics and prior academic performance, an SAT critical reading–writing discrepancy had a small effect on 1st-year grade point average as well as English course grades in college. Specifically, students who had relatively higher writing scores as compared to their critical reading scores earned higher grades in their 1st year of college as well as in their 1st-year English course(s). 相似文献
4.
5.
A College Board-sponsored survey of a nationally representative sample of 1995–96 SAT takers yielded a data base for more than 4, 000 examinees, about 500 of whom had attended formal coaching programs outside their schools. Several alternative analytical methods were used to estimate the effects of coaching on SAT I: Reasoning Test scores. The various analyses produced slightly different estimates. All of the estimates, however, suggested that the effects of coaching are far less than is claimed by major commercial test preparation companies. The revised SAT does not appear to be any more coachable than its predecessor. 相似文献
6.
7.
This article presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a 4-parameter beta model. The conditional distribution of scores on an alternate form, given the true score, is estimated from a binomial distribution based on the estimated effective test length. Agreement between classifications on alternate forms is estimated by assuming conditional independence, given the true score. Evaluation of the method showed estimates to be within 1 percentage point of the actual values in most cases. Estimates of decision accuracy and decision consistency statistics were only slightly affected by changes in specified minimum and maximum possible scores. 相似文献
8.
Stephan F. Gohmann 《Journal of Educational Measurement》1988,25(2):137-148
Comparing SAT scores among states using regression analysis leads to biased results because states differ in the proportion of students taking the exam. When the proportion of students taking the exam is included in the regression equation, the results can be biased because of misspecifieation bias. A method intended to correct for selection bias is presented, and empirical results suggest that sample selection bias is present in SAT score regressions. Regression equations and state rankings are compared between the selection-corrected equation and equations for which the selection problem is not addressed. The proposed method is one of many available as possible solutions to the selection problem. Alternative methods may produce different results 相似文献
9.
10.
Grades and Test Scores: Accounting for Observed Differences 总被引:1,自引:0,他引:1
Warren W. Willingham Judith M. Pollack Charles Lewis 《Journal of Educational Measurement》2002,39(1):1-37
Why do grades and test scores often differ? A framework of possible differences is proposed in this article. An approximation of the framework was tested with data on 8,454 high school seniors from the National Education Longitudinal Study. Individual and group differences in grade versus test performance were substantially reduced by focusing the two measures on similar academic subjects, correcting for grading variations and unreliability, and adding teacher ratings and other information about students. Concurrent prediction of high school average was thus increased from 0.62 to 0.90; differential prediction in eight subgroups was reduced to 0.02 letter‐grades. Grading variation was a major source of discrepancy between grades and test scores. Other major sources were teacher ratings and Scholastic Engagement, a promising organizing principle for understanding student achievement. Engagement was defined by three types of observable behavior: employing school skills, demonstrating initiative, and avoiding competing activities. While groups varied in average achievement, group performance was generally similar on grades and tests. Major factors in achievement were similarly constituted and similarly related from group to group. Differences between grades and tests give these measures complementary strengths in high‐stakes assessment. If artifactual differences between the two measures are not corrected, common statistical estimates of validity and fairness are unduly conservative. 相似文献
11.
Brent Bridgeman Catherine Trapani Edward Curley 《Journal of Educational Measurement》2004,41(4):291-310
The impact of allowing more time for each question on the SAT I: Reasoning Test scores was estimated by embedding sections with a reduced number of questions into the standard 30-minute equating section of two national test administrations. Thus, for example, questions were deleted from a verbal section that contained 35 questions to produce forms that contained 27 or 23 questions. Scores on the 23-question section could then be compared to scores on the same 23 questions when they were embedded in a section that contained 27 or 35 questions. Similarly, questions were deleted from a 25-question math section to form sections of 20 and 17 questions. Allowing more time per question had a minimal impact on verbal scores, producing gains of less than 10 points on the 200–800 SAT scale. Gains for the math score were less than 30 points. High-scoring students tended to benefit more than lower-scoring students, with extra time creating no increase in scores for students with SAT scores of 400 or lower. Ethnic/racial and gender differences were neither increased nor reduced with extra time. 相似文献
12.
SAT(Scholastic Assessment Test)作为美国目前广为接受的大学入学考试,其公平性一直遭受质疑,尤其是在性别、种族等敏感领域。基于美国某高中学生的SAT数据,运用最小二乘估计法,建立了关于SAT考试成绩的单方程线性回归模型。回归结果显示在保持模型中其他因素不变的情况下,SAT考试的确存在性别和种族歧视,且性别对成绩的影响要大于种族对成绩的影响。最后结合2016年SAT考试的公平性改革,探究SAT的未来发展方向及对我国新高考改革的借鉴。 相似文献
13.
Carmelo Terranova 《Journal of Experimental Education》2013,81(3):81-83
Randomly selected fifth, seventh, ninth, and eleventh graders (sixty from each grade) were givenanability test. The score and the time taken were used to test the hypotheses of no negative linear relationship and no curvilinear relationship between test score and test time. Although no significant linear relationships were found, significant curvilinear regressions of time on score were found in grades seven and nine. The strength of these significant relationships were relatively low in both grades. 相似文献
14.
SAT考试:高考制度改革可资借鉴的一面铜镜 总被引:3,自引:2,他引:3
自1999年开始,我国高考制度改革的重心实现了向考试科目设置以及高考形式和内容的改革方向的转移,江苏、浙江、吉林和山西四省分别推出了“3 综合”的考试新模式,广东省也积极进行了“3 X”考试模式的新探索,并将逐步推广到全国其他省市自治区。新一轮普通高校的招生考试制度改革,普遍摒弃了以往以单纯的知识测试作为录取学生的唯一依据的传统考试模式,突出和强调了对学生综合素质的考察,这反映了高等教育“大众化”发展趋势的要求,也体现了人们对于实施素质教育思想的高度认同。一、 素质教育的实施,要求我们必须改变传统的教育… 相似文献
15.
16.
17.
18.
Prior research has shown that there is substantial variability in the degree to which the SAT and high school grade point average (HSGPA) predict 1st-year college performance at different institutions. This article demonstrates the usefulness of multilevel modeling as a tool to uncover institutional characteristics that are associated with this variability. The results revealed that the predictive validity of HSGPA decreased as mean total SAT (i.e., sum of the three SAT sections) score at an institution increased and as the proportion of White freshmen increased. The predictive validity of the three SAT sections (critical reading, mathematics, and writing) varied differently as a function of different institution-level variables. These results suggest that the estimates of validity obtained and aggregated from multiple institutions may not accurately reflect the unique contextual factors that influence the predictive validity of HSGPA and SAT scores at a particular institution. 相似文献
19.
Are variations in test-preparation practices from school to school undermining the meaningfulness of achievement test results? Is there pressure to raise achievement test scores by the use of educationally unsound practices? What uses of achievement test scores are most common? Do teachers and administrators have reasonably accurate views of test score uses? 相似文献
20.
Richard Sawyer 《教育实用测度》2013,26(3):255-271
Current thinking on validity suggests that educational institutions and individuals should evaluate their uses of test scores in the context of their fundamental goals. Regression coefficients and other traditional criterion-related validity statistics provide relevant information, but often do not, by themselves, address the fundamental reasons for using test scores. Formal decision theory models provide a logically rigorous way to do this, but they are difficult to implement in practice. This article considers a simplification of formal decision theory models, in which one estimates the proportion of examinees for whom positive outcomes result from a use of test scores. For uses involving selection, the proportion of examinees with positive outcomes can be calculated by applying traditional regression coefficients to the marginal distribution of scores in the unselected population. The incremental usefulness of using a particular variable can be judged by comparing its proportion to that associated with no selection and to that associated with using another variable, either alone or jointly. Examples, related to college admission and retention, are given to illustrate these ideas. 相似文献