期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Team-Based Testing Improves Individual Learning

Jane S. Vogler Daniel H. Robinson 《Journal of Experimental Education》2016,84(4):787-803

In two experiments, 90 undergraduates took six tests as part of an educational psychology course. Using a crossover design, students took three tests individually without feedback and then took the same test again, following the process of team-based testing (TBT), in teams in which the members reached consensus for each question and answered until they were correct. Students took the other three tests individually with feedback. All students were individually tested over a portion of this content two weeks later and again after two months. Independent samples t tests revealed that TBT students scored higher when retested two months later than those who took the test individually. Finally, three-fourths of the students reported that they enjoyed TBT more than individual testing. Although TBT requires more class time to administer, it appears to be beneficial for long-term student learning. 相似文献

2.

Improving Multiple-Choice Test Performance for Examinees with Different Levels of Test Anxiety

Linda Crocker Alicia Schmitt 《Journal of Experimental Education》2013,81(4):201-205

The effectiveness of a strategy for improving performance on multiple-choice items for examinees was assessed. An aptitude-treatment interaction model was used to test the possibility of different treatment effects for examinees with different levels of test anxiety. Undergraduate measurement students responded to the Mandler-Sarason Test Anxiety Scale and to an objective test covering course content. For low-anxious examinees, generation of an answer before selecting a multiple-choice response led to higher test performance; for highly test anxious examinees, there was a slightly negative effect on performance. 相似文献

3.

The role of feedback and differences between good and poor decoders in a repeated word reading paradigm in first grade

Karly van Gorp Eliane Segers Ludo Verhoeven 《Annals of dyslexia》2017,67(1):1-25

The direct, retention, and transfer effects of repeated word and pseudoword reading were studied in a pretest, training, posttest, retention design. First graders (48 good readers, 47 poor readers) read 25 CVC words and 25 CVC pseudowords in ten repeated word reading sessions, preceded and followed by a transfer task with a different set of items. Two weeks after training, trained items were assessed again in a retention test. Participants either received phonics feedback, in which each word was spelled out and repeated; word feedback, in which each word was repeated; or no feedback. During the training, both good and poor readers improved in accuracy and speed. The increase in speed was stronger for poor readers than for good readers. The good readers demonstrated a stronger increase for pseudowords than for words. This increase in speed was most prominent in the first four sessions. Two weeks after training, the levels of accuracy and speed were retained. Furthermore, transfer effects on speed were found for pseudowords in both groups of readers. Good readers performed most accurately during the training when they received no feedback while poor readers performed most accurately during the training with the help of phonics feedback. However, feedback did not differentiate for reading speed or for effects after the training. The effects of repeated word reading were found to be stronger for poor readers than for good readers. Moreover, these effects were found to be stronger for pseudowords than for words. This indicates that repeated word reading can be seen as an important trigger for the improvement of decoding skills. 相似文献

4.

SCRAMBLING CONTENT IN ACHIEVEMENT TESTING: AN APPLICATION OF MULTIPLE MATRIX SAMPLING IN EXPERIMENTAL DESIGN

KEN SIROTNIK ROGER WELLINGTON 《Journal of Educational Measurement》1974,11(3):179-188

This study was designed to research the question of scrambling item content in the construction of achievement tests, so that very general implications could be drawn for both examinee and item populations. To achieve this generality, the methodology of multiple matrix sampling was combined with a simple two group experimental design: a random group of 8th graders responded to mathematics, science, social studies, reading, and language arts achievement items organized in a scrambled (random) test format, while another random group responded to the same items organized in a fixed (segregated by subject matter) test format. The results indicated that scrambling cognitive test items has minimal or no effect on mean examinee test performance or on any of the other parameters included in the analysis. 相似文献

5.

Multiple-Choice Models: The Distractors Are Also Part of the Item

David Thissen Lynne Steinberg Anne R. Fitzpatrick 《Journal of Educational Measurement》1989,26(2):161-176

This paper describes an item response model for multiple-choice items and illustrates its application in item analysis. The model provides parametric and graphical summaries of the performance of each alternative associated with a multiple-choice item; the summaries describe each alternative's relationship to the proficiency being measured. The interpretation of the parameters of the multiple-choice model and the use of the model in item analysis are illustrated using data obtained from a pilot test of mathematics achievement items. The use of such item analysis for the detection of flawed items, for item design and development, and for test construction is discussed. 相似文献

6.

Use of Restricted Item Response Models for Examining Item Difficulty Ordering and Slope Uniformity

Suzanne Lane 《Journal of Educational Measurement》1991,28(4):295-309

This article demonstrates the utility of restricted item response models for examining item difficulty ordering and slope uniformity for an item set that reflects varying cognitive processes. Twelve sets of paired algebra word problems were developed to systematically reflect various types of cognitive processes required for successful performance. This resulted in a total of 24 items. They reflected distance-rate–time (DRT), interest, and area problems. Hypotheses concerning difficulty ordering and slope uniformity for the items were tested by constraining item difficulty and discrimination parameters in hierarchical item response models. The first set of model comparisons tested the equality of the discrimination and difficulty parameters for each set of paired items. The second set of model comparisons examined slope uniformity within the complex DRT problems. The third set of model comparisons examined whether the familiarity of the story context affected item difficulty for two types of complex DRT problems. The last set of model comparisons tested the hypothesized difficulty ordering of the items. 相似文献

7.

An Empirical Examination of the IRT Information of Polytomously Scored Reading Items Under the Generalized Partial Credit Model

John R. Donoghue 《Journal of Educational Measurement》1994,31(4):295-311

Using Muraki's (1992) generalized partial credit IRT model, polytomous items (responses to which can be scored as ordered categories) from the 1991 field test of the NAEP Reading Assessment were calibrated simultaneously with multiple-choice and short open-ended items. Expected information of each type of item was computed. On average, four-category polytomous items yielded 2.1 to 3.1 times as much IRT information as dichotomous items. These results provide limited support for the ad hoc rule of weighting k-category polytomous items the same as k - 1 dichotomous items for computing total scores. Polytomous items provided the most information about examinees of moderately high proficiency; the information function peaked at 1.0 to 1.5, and the population distribution mean was 0. When scored dichotomously, information in polytomous items sharply decreased, but they still provided more expected information than did the other response formats. For reference, a derivation of the information function for the generalized partial credit model is included. 相似文献

8.

Reading retardation or linguistic deficit? II: test-answering strategies in hearing and hearing-impaired school children

D. J. Wood A. J. Griffiths A. Webster 《Journal of Research in Reading》1981,4(2):148-156

The reading test performances of 60 hearing and 60 hearing-impaired children of similar measured reading ages on the Southgate reading test were analysed. As in an earlier study using the Brimer Wide-span test it was shown that the performances of the two groups were quite different. Deaf children tackled significantly more test items than the hearing and made significantly more errors in achieving similar reading scores. A detailed examination of both correct and incorrect answers showed that the deaf children were not simply providing answers to questions at random. Even where they produced incorrect responses they tended, as a group, to select the same answer. Unlike the hearing group, who did not converge on the same incorrect solution to difficult test items, the deaf were systematic in their choices, indicating that they were using a consistent strategy. A post hoc examination of individual test items indicated that the deaf children were selecting answers on the basis of word associations in each test item. On some items these produced a correct response, on others the same (incorrect) response. The implications of these findings are discussed to argue that reading tests based on hearing norms are of little value in the assessment of reading abilities and reading problems in hearing-impaired children. 相似文献

9.

Comparing the Fit of Item Response Theory and Factor Analysis Models

Alberto Maydeu-Olivares Li Cai Adolfo Hernández 《Structural equation modeling》2013,20(3):333-356

Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model. 相似文献

10.

Representations of single and compound stimuli in negative and positive patterning

Justin A. Harris Saba Gharaei Clinton A. Moore 《Learning & behavior》2009,37(3):230-245

In four experiments, rats were trained on different patterning discriminations before being tested with compounds composed of novel combinations of the trained stimuli. In Experiment 1, rats were trained on a negative-patterning schedule (A+ B+ AB-) intermixed with reinforced presentations of a second compound (CD+). On a subsequent test, the rats responded more to two novel compounds, AC and BD, than to A and B, but less than to CD. In Experiment 2, rats were trained on two concurrent negative-patterning discriminations (A+ B+ AB-, C+ D+ CD-). On test, they responded more to AC and BD than to AB and CD, but less than to the single stimuli. In Experiment 3, rats were trained on two concurrent positive-patterning discriminations (A-B- AB+, C- D- CD+). On test, their response rates to AC and BD were not different from the response rates to the trained compounds (AB and CD). Finally, in Experiment 4, rats were trained on a positive- and negative-patterning discrimination concurrently. Once again, on test, response rates to AC and BD were not different from responding on reinforced trials of the trained discriminations (A+, B+, and CD+). We discuss the implications of these findings for elemental and configural models of stimulus representation. 相似文献

11.

Using item response theory to describe the Nonverbal Literacy Assessment (NVLA)

下载免费PDF全文

Danielle Fleming Mark Wilson Lynn Ahlgrim‐Delzell 《Psychology in the schools》2018,55(4):341-349

The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218‐item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to investigate the NVLA using Rasch models. First, we reduced the number of items using a unidimensional model, which resulted in high levels of test reliability despite decreasing the number of questions, providing the same information about student abilities in less time. Second, the multidimensional analysis indicated that it is possible to view the NVLA as a test with four dimensions, resulting in more detailed information about student abilities. Finally, we combined these approaches to obtain both specificity and brevity, with a four‐dimensional model using 133 items from the original NVLA. 相似文献

12.

A Comparison of Quantitative Questions in Open-Ended and Multiple-Choice Formats

Brent Bridgeman 《Journal of Educational Measurement》1992,29(3):253-271

Open–ended counterparts to a set of items from the quantitative section of the Graduate Record Examination (GRE–Q) were developed. Examinees responded to these items by gridding a numerical answer on a machine-readable answer sheet or by typing on a computer. The test section with the special answer sheets was administered at the end of a regular GRE administration. Test forms were spiraled so that random groups received either the grid-in questions or the same questions in a multiple-choice format. In a separate data collection effort, 364 paid volunteers who had recently taken the GRE used a computer keyboard to enter answers to the same set of questions. Despite substantial format differences noted for individual items, total scores for the multiple-choice and open-ended tests demonstrated remarkably similar correlational patterns. There were no significant interactions of test format with either gender or ethnicity. 相似文献

13.

An Experiment with the Controlled Reader

《The Journal of educational research》2012,105(7):265-269

Abstract

This study was an investigation of items on the Peabody Picture Vocabulary Test (PPVT) to ascertain if verbal responses to items missed indicated that the concept was familiar at the same level of abstraction as the word in the PPVT. One hundred 8-year-old children-25 black boys, 25 black girls, 25 white boys, and 25 white girls-were administered Form A of the PPVT. Eighty-eight children responded verbally to the pictures of the stimulus words missed. Data were analyzed by means of a two-way analysis of variance. A chi square test of significance was used to determine significance level of difference between items for each group. Judges analyzed verbal responses to determine if responses elicited were 1) at the same level of abstraction as the stimulus word, 2) considered to be synonymous to the stimulus word, and 3) indicated the student's understanding of the concept signified by the word. A total of 23 words were identified as being missed disproportionately by one group more than the other. Verbal responses indicated that the concept was familiar for 16 items and unfamiliar for three items. Of the remaining four items, there was indication of differences among the groups. 相似文献

14.

COLLEGE STUDENTS’ REACTIONS TOWARDS KEY FACETS OF CLASSROOM TESTING

Moshe Zeidner 《Assessment & Evaluation in Higher Education》1990,15(2):151-169

The major aim of the present study is to assess college students’ attitudes, perceptions, emotional reactions and affective dispositions with respect to various critical dimensions of course achievement testing and assessment, including: “papers” vs. “exams”, “essay” vs. “multiple choice” type formats, “open book” vs. “closed book” exams, “free choice” among items vs. “no free choice” among items, and “oral” vs. “written” modes of test administration. A further aim is to delineate the construction, properties, and potential classroom uses and applications of a selected sample of examinee feedback inventories designed to gauge students’ test attitudes and dispositions. The use of each examinee feedback inventory is demonstrated and exemplified in the context of an empirical study. This paper discusses the assumptions underlying the use of feedback systems in college achievement evaluation; their importance for assessing the face validity of classroom tests; some possible future applications of feedback inventories for research and applied purposes in college; and some guidelines for future research. A mapping sentence specifying the universe of content of test attitude and examinee feedback research is suggested as a heuristic device for guiding future research. 相似文献

15.

A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

下载免费PDF全文

Rui Guo Yi Zheng Hua‐Hua Chang 《Journal of Educational Measurement》2015,52(3):280-300

An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the existing methods were designed to detect drifts in individual items, which may not be adequate for test characteristic curve–based linking or equating. One example is the item response theory–based true score equating, whose goal is to generate a conversion table to relate number‐correct scores on two forms based on their test characteristic curves. This article introduces a stepwise test characteristic curve method to detect item parameter drift iteratively based on test characteristic curves without needing to set any predetermined critical values. Comparisons are made between the proposed method and two existing methods under the three‐parameter logistic item response model through simulation and real data analysis. Results show that the proposed method produces a small difference in test characteristic curves between administrations, an accurate conversion table, and a good classification of drifted and nondrifted items and at the same time keeps a large amount of linking items. 相似文献

16.

Testing for factorial invariance of the Modified Leadership Scale for Sports: using a Japanese version 总被引：1，自引：1，他引：0

Hyungil Harry Kwon Siwan Han Etsuko Ogasawara 《Asia Pacific Journal of Education》2011,31(1):65-76

The objective of this study was to provide empirical evidence to support psychometric properties of a modified four-dimensional model of the Leadership Scale for Sports (LSS). The study tested invariance of all parameters (i.e., factor loadings, error variances, and factor variances–covariances) in the four-dimensional measurement model between two groups of student-athletes. For testing multi-group invariance of the proposed scale, 335 middle school and 320 high school student-athletes in Japan participated in this study. The modified version of the LSS consists of 35 items representing training instruction, democratic behaviour, positive feedback, and social support. A chi-square difference test was employed for model comparisons. The results supported configural, metric, scalar and factor variance–covariance invariance in the modified LSS across the two student-athlete groups. 相似文献

17.

‘Consider the Opposite’ – Effects of elaborative feedback and correct answer feedback on reducing confirmation bias – A pre-registered study

《Contemporary educational psychology》2020

Unbiased reasoning is considered an essential critical thinking skill that students need to possess to face the future challenges in their work and life. Confirmation bias, which is the tendency to selectively attend to information that is consistent with held beliefs, presents a significant thread to unbiased reasoning. An effective strategy to reduce confirmation bias is the ‘consider-the-opposite’-strategy (COS). The central question of this pre-registered study was whether providing elaborative, worked example feedback after COS practice would lead to a better performance on previously practised and transfer tasks than correct-answer feedback. Participants were 132 university students who took a confirmation bias pre-test, watched an instructional video on COS afterwards and next received either worked example feedback or correct answer feedback on practice tasks, practised only, watched the instruction only or received no treatment. Finally, all participants took a learning test assessing their skill to avoid confirmation bias, and a transfer test assessing whether they could apply this acquired skill to problems containing other biases. Results revealed no differences on the learning test between both feedback conditions, but students who received feedback scored significantly higher on the confirmation bias problems than students who did not receive feedback. We carried out our pre-registered analysis plan, but due to the low reliability of particularly the pre-test, we carried out an additional exploratory analysis on subsets of post-test items and a subset of transfer test items. Results on learning revealed the same pattern as the planned analyses. However, we found no differences between any of the conditions on transfer. 相似文献

18.

学业水平考试物理试题难度预估方法探究 总被引：1，自引：1，他引：0

郭长江牟亚萍《考试研究》2013,(6):44-53

目前上海市普通高中学业水平考试未实行考前试测制度,因此试题难易度主要依据试题编制者的经验进行预估,尚无量化研究的方法。本研究根据国内外研究经验,从试题的物理概念、试题设计、数学运算三个项目出发,结合2011年上海市普通高中物理学业水平考试试题难度实测数据分析,构建试题难度预估的量化方法,并用2012年上海市普通高中物理学业水平考试试题难度实测数据检验其准确性,期望为今后物理试题难易度预估提供研究的基础。相似文献

19.

Responding to the Message: providing a social context for children learning to write

Helen Jerram Ted Glynn Bryan Tuck 《教育心理学》1988,8(1-2):31-40

Abstract In this classroom‐based research study, written expression was viewed as an interactive social process involving written communication between the teacher and the children. Children received increased opportunities to write on topics they chose themselves, and their teacher responded in writing to the content of their writing. The teacher did not provide corrective feedback for accuracy of spelling or grammar throughout the study. Written content feedback from the teacher was provided to each child according to an intra‐subject ABAB research design. Analysis of the teacher's written feedback identified her use of six specific categories of positive response to the themes, ideas and characters of each child's writing. Significant increases in both quantity and quality of writing occurred during the written content feedback phases. Spelling accuracy was maintained at a high level of accuracy throughout the study. 相似文献

20.

Linking Response-Time Parameters onto a Common Scale

Wim J. van der Linden 《Journal of Educational Measurement》2010,47(1):92-114

Although response times on test items are recorded on a natural scale, the scale for some of the parameters in the lognormal response-time model ( van der Linden, 2006 ) is not fixed. As a result, when the model is used to periodically calibrate new items in a testing program, the parameter are not automatically mapped onto a common scale. Several combinations of linking designs and procedures for the lognormal model are examined that do map parameter estimates onto a common scale. For each of the designs, the standard error of linking is derived. The results are illustrated using examples with simulated data. 相似文献