期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Effects of repeated testing on short- and long-term memory performance across different test formats

Tova Stenlund Anna Sundström Bert Jonsson 《教育心理学》2016,36(10):1710-1727

This study examined whether practice testing with short-answer (SA) items benefits learning over time compared to practice testing with multiple-choice (MC) items, and rereading the material. More specifically, the aim was to test the hypotheses of retrieval effort and transfer appropriate processing by comparing retention tests with respect to practice testing format. To adequately compare SA and MC items, the MC items were corrected for random guessing. With a within-group design, 54 students (mean age = 16 years) first read a short text, and took four practice tests containing all three formats (SA, MC and statements to read) with feedback provided after each part. The results showed that both MC and SA formats improved short- and long-term memory compared to rereading. More importantly, practice testing with SA items is more beneficial for learning and long-term retention, providing support for retrieval effort hypothesis. Using corrections for guessing and educational implications are discussed. 相似文献

2.

Qualitative and quantitative differences in learning associated with multiple-choice testing

K. Fisher S. Williams J. Roth 《科学教学研究杂志》1981,18(5):449-464

This study assesses some effects of the Computer-Assisted Self-Evaluation (CASE) system of frequent multiple-choice testing with immediate computer feedback; it is part of a larger project aiming to combine the principal strengths of individualized instruction with lecture teaching (Fisher, 1979). Learning and retention are examined in two equivalent groups of undergraduates enrolled in an upper division science course. One group (N =34) received 24 CASE quizzes with immediate feedback and the other (N=30) received two CASE-generated midterms with delayed feedback. Quiz students significantly outperformed Midterm students on the posttest; the Quiz section scored nine percentage points higher on rote items and fourteen points higher on meaningful items. Quiz students also had more positive attitudes toward and were more involved in the course. On a retention test given two years later, the Quiz Group scored eight percentage points higher than the Midterm Group on meaningful items. This study suggests that, contrary to popular opinion, multiple-choice questions promote meaningful learning at least as well as, and possibly better than, rote learning. The CASE system appears to be about as effective as other forms of frequent testing and immediate feedback in enhancing learning, and it provides a simple, cost-effective means of individualized testing in large lecture classes. 相似文献

3.

Estimation of the All Tests Pass Rate When No Examinee Took All of the Tests

《教育实用测度》2013,26(4):393-406

Two models are presented in this article for estimating the proportion of students who would pass all of three or more content area tests given that none have actually been tested in more than two of the content areas. The first model allows one to estimate the proportion of students who would pass all of three or more content area tests from the test results of a study in which no student took more than two of the tests; the second model (which requires an outside estimate of the correlations between the different content area tests) allows one to estimate the proportion of students who would pass all of three or more content area tests from the test results of a study (or field test results) in which students took only one content area test. The models were tested on the Texas End-of-Course test battery (which consists of four content area tests) results of students who took all four content area tests prior to or in the spring of 2001, with at least one of the end-of-course content area tests taken in the spring of 2001. The model test results may have particular application to state assessment programs that must perform standard setting on high-stakes exams before the first live administration of the exams. 相似文献

4.

Students reflecting on test performance and feedback: an on-line approach

Georgina Fyfe Sue Fyfe Jan Meyer Mel Ziman Kathy Sanders Julie Hill 《Assessment & Evaluation in Higher Education》2014,39(2):179-194

Undergraduate students accessing on-line tests in Human Biology in three Western Australian universities were asked to complete an on-line post-test reflective survey about their perceptions of their test performance in light of automated feedback. The survey allowed pre-determined choices and comment text boxes relating to students’ perceptions of their performance, self-identified areas of difficulty and suggested strategies for improving test performance. One-third of students undertaking on-line tests responded to the optional survey, and 60% of respondents thought reflecting on feedback was useful. Students reflecting on formative rather than summative assessment reported a more strategic approach to testing, often using it to assess their knowledge and prepare for future assessment. Their reflections were more internally focused on motivation and preparation compared with those assessed summatively. Respondents were more likely to be female, older, more experienced learners who had scored well in the test. Younger respondents expected higher scores than they achieved and were less likely to reflect, but, when they did, were more likely to select pre-determined reasons for their performance and less likely to suggest strategies for improvement. These results support formal training and scaffolded integration of reflection into on-line assessment feedback, especially for less experienced learners. 相似文献

5.

Exploring the effects of group testing on graduate students’ motivation and achievement

Dawson R. Hancock 《Assessment & Evaluation in Higher Education》2007,32(2):215-227

This study explored the impact of individual versus two‐person group testing on graduate students’ achievement and motivation to learn while enrolled in a 16‐lesson educational research methods course. Students in two sections of the course were exposed to the same content and instructional methods, with one exception: students in one section took three professor‐created, criterion‐referenced examinations (during lessons 6, 11 and 16) individually, whereas students in the other section took the same examinations with a partner with whom they could examine and discuss the test items in order to derive a common answer. At the end of the course, the motivation to learn of all students was assessed. Results revealed that students tested with a partner achieved significantly higher scores on two of the examinations and demonstrated significantly higher levels of motivation to learn than did students taking the examinations alone. Qualitative analysis of the students’ written expressions concerning the course and of their comments from group and individual interviews revealed possible explanations for these outcomes. 相似文献

6.

Predicting student achievement for low stakes tests with effort and task value 总被引：1，自引：0，他引：1

James S. Cole David A. Bergin Tiffany A. Whittaker 《Contemporary educational psychology》2008,33(4):609-624

We investigated motivation for taking low stakes tests. Based on expectancy-value theory, we expected that the effect of student perceptions of three task values (interest, usefulness, and importance) on low stakes test performance would be mediated by the student’s reported effort. We hypothesized that all three task value components would play a significant role in predicting test-taking effort, and that effort would significantly predict test performance. Participants were 1005 undergraduate students enrolled at four midsize public universities. After students took all four subtests of CBASE, a standardized general education exam, they immediately filled out a motivation survey. Path analyses showed that the task value variables usefulness and importance significantly predicted test-taking effort and performance for all four tests. These results provide evidence that students who report trying hard on low stakes tests score higher than those who do not. The results indicate that if students do not perceive importance or usefulness of an exam, their effort suffers and so does their test score. While the data are correlational, they suggest that it might be useful for test administrators and school staff to communicate to students the importance and usefulness of the test that they are being asked to complete. 相似文献

7.

Classroom Testing Procedures,Test Anxiety,and Achievement

Ronald N. Marso 《Journal of Experimental Education》2013,81(3):54-58

Four groups (N = 116) were maintained in a 4-factor analysis of covariance design to determine if more frequent, graded unit examinations followed by test feedback facilitate achievement and allow students with high-measured test anxiety to perform better on final course examinations. The testing procedures studied consisted of the administration of 168 examination items as either three or six unit exams, grading or not grading the unit exams, and providing or not providing class feedback and discussion following the examinations. Analysis of performance on two posttest measures indicated that the subjects achieved more from frequent, graded unit tests followed by feedback; however, variations of these conditions did not appear to influence the performance of the students with high-measured test anxiety. 相似文献

8.

A Comparison of the Effects of Practice Tests and Traditional Review on Performance and Calibration

Linda Bol Douglas J. Hacker 《Journal of Experimental Education》2013,81(2):133-151

The impact of practice tests on students' calibration and exam performance for multiple-choice and essay items was investigated. The participants were 59 graduate students enrolled in 1 of 2 sections (practice tests and no practice tests) of an introductory research methods in education course. Practice tests were associated with significantly lower scores on the midterm multiple-choice items and less accurate predictions and postdictions on those items. High-achieving students were more accurate in their calibrations than low-achieving students. Among low-achieving students, prediction and postdiction accuracy was significantly higher for essay than for multiple-choice items. In open-ended responses, a large percentage of students who took the practice tests indicated that they were a beneficial review strategy. 相似文献

9.

Assessing intellectual potential in Tanzanian children in poor areas of Dar es Salaam

Steve Humble Ian Schagen 《Assessment in Education: Principles, Policy & Practice》2018,25(4):399-414

The research set out in this paper attempts to identify whether one of three conventional IQ tests is more capable of identifying intellectual potential amongst poor children in Dar es Salaam. To this end 1857 children from 17 government schools in poor districts of Dar es Salaam were asked to complete a questionnaire and undertake a range of tests. The study included teacher, peer and self-nomination. It has been noted that static testing may not fully elicit the abilities of African children. It has been suggested that dynamic testing might provide a more fair and equitable means of assessment. Therefore 101 students took part in a control and intervention group in order to investigate. The findings show a significant correlation between IQ test scores and other test outcomes. Those with larger families and older children perform less well on IQ tests. Peer ability and self-confidence positively influence test scores. 相似文献

10.

Learning to label: socialisation,gender, and the hidden curriculum of high‐stakes testing

Jennifer Booher‐Jennings 《British Journal of Sociology of Education》2008,29(2):149-160

Although high‐stakes tests play an increasing role in students’ schooling experiences, scholars have not examined these tests as sites for socialisation. Drawing on qualitative data collected at an American urban primary school, this study explores what educators teach students about motivation and effort through high‐stakes testing, how students interpret and internalise these messages, and how student hierarchies develop as a result. I found that teachers located boys’ failure in their poor behavior and attitudes, while arguing that girls simply needed more self‐esteem to pass the test. Most boys accepted their teachers’ diagnosis of the problem. However, the boys who felt that they were already ‘doing their best’ and ‘working hard’ began to doubt that educational success is a function of merit and effort. I conclude that students learn about much more than the three Rs through their experiences with high‐stakes testing, and argue that future research should attend to the social dimensions of these experiences. 相似文献

11.

Tests of Equivalence for One-Way Independent Groups Designs

Robert A. Cribbie Chantal A. Arpin-Cribbie Jamie A. Gruman 《Journal of Experimental Education》2013,81(1):1-13

Researchers in education are often interested in determining whether independent groups are equivalent on a specific outcome. Equivalence tests for 2 independent populations have been widely discussed, whereas testing for equivalence with more than 2 independent groups has received little attention. The authors discuss alternatives for testing the equivalence of more than 2 independent populations, and they use a Monte Carlo study to demonstrate and compare the performance of these alternatives under several conditions. The results indicate that a 1-way test (e.g., Wellek's F test) is recommended for assessing the equivalence of more than 2 independent groups because approaches based on conducting pairwise tests of equivalence are overly conservative. 相似文献

12.

The Impact of Item Dependency on the Efficiency of Testing and Reliability of Student Scores From a Computer Adaptive Assessment of Reading Comprehension

Yaacov Petscher Barbara R. Foorman Adrea J. Truckenmiller 《Journal of research on educational effectiveness》2017,10(2):408-423

The objective of the present study was to evaluate the extent to which students who took a computer adaptive test of reading comprehension accounting for testlet effects were administered fewer passages and had a more precise estimate of their reading comprehension ability compared to students in the control condition. A randomized controlled trial was used whereby 529 students in Grades 4–8 and 10 were randomly assigned to one of two conditions, both of whom took a computerized adaptive assessment of reading comprehension. Participants in the experimental condition had ability scores estimated as a function of an item response model, which accounted for item-dependence effects in the reading assessment, whereas control students took a version where item-dependence effects were not controlled. Results indicated that examinees in the experimental condition took fewer passages (average Hedges' g = 0.97) and had more reliable estimates of their reading comprehension ability (average Hedges' g = 0.60). Findings are discussed in the context of potential time savings in assessment practices without sacrificing reliability. 相似文献

13.

Silent versus oral reading comprehension and efficiency

R. Steve McCallum Shannon Sharp Sherry Mee Bell Thomas George 《Psychology in the schools》2004,41(2):241-246

Seventy‐four students read passages from an individually administered test of reading comprehension (a subtest from the Test of Dyslexia, a test of reading and related abilities currently in development; McCallum & Bell, 2001), and then answered literal and inferential questions. Students were randomly assigned to one of two conditions; 39 students read the passages silently and 35 read orally, with time recorded for each passage read. Comprehension and time were dependent measures for a Multivariate Analysis of Covariance (MANCOVA) and two follow‐up Analyses of Covariance (ANCOVA). After controlling for reading ability, results from the MANCOVA showed a significant combined effect ( p < .05); however, a comparison of mean reading comprehension scores showed no significant difference between silent readers and oral readers ( p > .05). On the other hand, with reading ability controlled, silent readers took significantly less time to complete passages compared to those who read orally ( p < .02). In fact, students took 30% longer to read orally than silently, on average. When test directions do not specify either oral or silent reading and error analysis is not a goal, testing will be more efficient via silent responding with no loss of comprehension. © 2004 Wiley Periodicals, Inc. Psychol Schs 41: 241–246, 2004. 相似文献

14.

Information Feedback,Need Achievement and Retention 1

《The Journal of educational research》2012,105(7):256-261

Abstract

A test of achievement motivation was administered to 260 sixth graders. One month later, students participated in a science reading lesson followed by a multiple-choice test based upon that lesson. Feedback regarding performance was provided according to a standardized procedure either immediately after the test, or with one, two or three day delays. Then a retention test was administered to each group three days alter feedback. Results demonstrated that students who received feedback with a delay of one day manifested greater retention than students who received immediate feedback. There were no significant differences among groups who were exposed to delays of one, two or three days. A positive relation between achievement motivation and retention was demonstrated. There was no interaction between achievement motivation and feedback schedules. 相似文献

15.

Collaborative Testing Improves Performance but Not Content Retention in a Large-Enrollment Introductory Biology Class 总被引：1，自引：0，他引：1

Hayley Leight Cheston Saunders Robin Calkins Michelle Withers 《CBE life sciences education》2012,11(4):392-401

Collaborative testing has been shown to improve performance but not always content retention. In this study, we investigated whether collaborative testing could improve both performance and content retention in a large, introductory biology course. Students were semirandomly divided into two groups based on their performances on exam 1. Each group contained equal numbers of students scoring in each grade category (“A”–“F”) on exam 1. All students completed each of the four exams of the semester as individuals. For exam 2, one group took the exam a second time in small groups immediately following the individually administered test. The other group followed this same format for exam 3. Individual and group exam scores were compared to determine differences in performance. All but exam 1 contained a subset of cumulative questions from the previous exam. Performances on the cumulative questions for exams 3 and 4 were compared for the two groups to determine whether there were significant differences in content retention. Even though group test scores were significantly higher than individual test scores, students who participated in collaborative testing performed no differently on cumulative questions than students who took the previous exam as individuals. 相似文献

16.

A COMPARISON OF THE RELIABILITY AND VALIDITY OF TWO METHODS FOR ASSESSING PARTIAL KNOWLEDGE ON A MULTIPLE-CHOICE TEST

RONALD K. HAMBLETON DENNIS M. ROBERTS ROSS E. TRAUB 《Journal of Educational Measurement》1970,7(2):75-82

Differential weighting of response alternatives and confidence testing have been proposed as ways to assess partial knowledge on multiple-choice tests. 211 students in an educational measurement course took their midterm examination under one of three procedures. Results from those students administered the test under conventional directions provided a baseline for comparing, in terms of reliability and validity, the results from students who took the test under the differential weighting of response alternatives or the confidence testing instructions. Reliability was estimated by the split-half technique. Validity was estimated by correlating midterm test scores with scores on a final examination. This investigation provides some support for the contention that validity can be improved using more sophisticated testing techniques. Suggestions for the conduct of more definitive studies were offered. 相似文献

17.

Fostering Creativity in Engineering Undergraduates

David H. Cropley Arthur J. Cropley 《High Ability Studies》2000,11(2):207-219

In the present study, an attempt was made to facilitate for engineering undergraduates to come up with innovative ideas by teaching creativity, not simply in a paper and pencil test situation but also in a practical exercise. A total of 64 male engineering undergraduates received three lectures on creativity at the beginning of a course on engineering innovation. Some of them (N=37) also completed a ''creativity'' test: TCT-DP (Urban & Jellen, 1996) and were individually counselled on the basis of test scores. A separate control group (N=21) took the test together with these students, but otherwise did not participate in any way in the study. Upon retesting 6 weeks later the counselled students were more innovative, whereas the control group were simply less inhibited. In addition, machines constructed by the counselled students were more elegant and creative than those of the 27 students who merely attended the lectures. Thus, the training was associated with changes in behaviour not only on the test, but in a practical activity also. 相似文献

18.

Psychometric Characteristics of Computer-Adaptive and Self-Adaptive Vocabulary Tests: The Role of Answer Feedback and Test Anxiety

Walter P. Vispoel 《Journal of Educational Measurement》1998,35(2):155-167

This study focused on the effects of administration mode (computer-adaptive test [CAT] versus self-adaptive test [SAT]), item-by-item answer feedback (present versus absent), and test anxiety on results obtained from computerized vocabulary tests. Examinees were assigned at random to four testing conditions (CAT with feedback, CAT without feedback, SAT with feedback, SAT without feedback). Examinees completed the Test Anxiety Inventory (Spielberger, 1980) before taking their assigned computerized tests. Results showed that the CATs were more reliable and took less time to complete than the SATs. Administration time for both the CATs and SATs was shorter when feedback was provided than when it was not, and this difference was most pronounced for examinees at medium to high levels of test anxiety. These results replicate prior findings regarding the precision and administrative efficiency of CATs and SATs but point to new possible benefits of including answer feedback on such tests. 相似文献

19.

Effects of different types of true–false questions on memory awareness and long-term retention

Lydia Schaap Peter Verkoeijen Henk Schmidt 《Assessment & Evaluation in Higher Education》2014,39(5):625-640

This study investigated the effects of two different true–false questions on memory awareness and long-term retention of knowledge. Participants took four subsequent knowledge tests on curriculum learning material that they studied at different retention intervals prior to the start of this study (i.e. prior to the first test). At the first and fourth (pre- and post-) tests, participants indicated which form of memory awareness (i.e. remember, know, familiar and/or guess) accompanied their answer. On the two intermediate tests, testing format was manipulated: true/false or true/false justification, that is a true/false statement with the additional instruction to explain why the statement is true or false. The results resembled earlier findings in that different forms of memory awareness could be distinguished. The study did not indicate (additional) knowledge schematisation as a result of testing or testing format. However, independent of test format, the proportion of correct answers on the post-test was higher than on the pre-test. This could indicate that the beneficial effects of testing can occur even when the learning episode was at a long retention interval prior to the first test. 相似文献

20.

Retention of prose following testing with different types of tests

Philippe C. Duchastel 《Contemporary educational psychology》1981,6(3):217-226

Taking a test on a passage one has just studied is known to enhance later retention of the passing contents. This study examined the effects of three types of initial test on later retention: a short-answer test, a multiple-choice test, and a full free-recall test. Questions on the first two of these tests covered only half of the passage contents. Later retention was compared for both initially tested content and un-tested content with that of a control group not initially tested on the passage at all. The subjects were 57 secondary school students who studied a brief history text before taking one of the initial tests. All were given retention tests 2 weeks later. The classical testing effect (enhanced retention due to initial testing) was shown to be influenced by the type of initial test used. Thus, a testing effect was evident in the case of the initial short-answer test, but not in the case of either of the other two tests. A depth-of-processing view is advanced in interpreting this finding. The testing effect was found not to generalize to untested content and in one condition (the initial multiple-choice test), retention of untested content was depressed. 相似文献