首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In two experiments, 90 undergraduates took six tests as part of an educational psychology course. Using a crossover design, students took three tests individually without feedback and then took the same test again, following the process of team-based testing (TBT), in teams in which the members reached consensus for each question and answered until they were correct. Students took the other three tests individually with feedback. All students were individually tested over a portion of this content two weeks later and again after two months. Independent samples t tests revealed that TBT students scored higher when retested two months later than those who took the test individually. Finally, three-fourths of the students reported that they enjoyed TBT more than individual testing. Although TBT requires more class time to administer, it appears to be beneficial for long-term student learning.  相似文献   

2.
The effectiveness of a strategy for improving performance on multiple-choice items for examinees was assessed. An aptitude-treatment interaction model was used to test the possibility of different treatment effects for examinees with different levels of test anxiety. Undergraduate measurement students responded to the Mandler-Sarason Test Anxiety Scale and to an objective test covering course content. For low-anxious examinees, generation of an answer before selecting a multiple-choice response led to higher test performance; for highly test anxious examinees, there was a slightly negative effect on performance.  相似文献   

3.
The direct, retention, and transfer effects of repeated word and pseudoword reading were studied in a pretest, training, posttest, retention design. First graders (48 good readers, 47 poor readers) read 25 CVC words and 25 CVC pseudowords in ten repeated word reading sessions, preceded and followed by a transfer task with a different set of items. Two weeks after training, trained items were assessed again in a retention test. Participants either received phonics feedback, in which each word was spelled out and repeated; word feedback, in which each word was repeated; or no feedback. During the training, both good and poor readers improved in accuracy and speed. The increase in speed was stronger for poor readers than for good readers. The good readers demonstrated a stronger increase for pseudowords than for words. This increase in speed was most prominent in the first four sessions. Two weeks after training, the levels of accuracy and speed were retained. Furthermore, transfer effects on speed were found for pseudowords in both groups of readers. Good readers performed most accurately during the training when they received no feedback while poor readers performed most accurately during the training with the help of phonics feedback. However, feedback did not differentiate for reading speed or for effects after the training. The effects of repeated word reading were found to be stronger for poor readers than for good readers. Moreover, these effects were found to be stronger for pseudowords than for words. This indicates that repeated word reading can be seen as an important trigger for the improvement of decoding skills.  相似文献   

4.
This study was designed to research the question of scrambling item content in the construction of achievement tests, so that very general implications could be drawn for both examinee and item populations. To achieve this generality, the methodology of multiple matrix sampling was combined with a simple two group experimental design: a random group of 8th graders responded to mathematics, science, social studies, reading, and language arts achievement items organized in a scrambled (random) test format, while another random group responded to the same items organized in a fixed (segregated by subject matter) test format. The results indicated that scrambling cognitive test items has minimal or no effect on mean examinee test performance or on any of the other parameters included in the analysis.  相似文献   

5.
This paper describes an item response model for multiple-choice items and illustrates its application in item analysis. The model provides parametric and graphical summaries of the performance of each alternative associated with a multiple-choice item; the summaries describe each alternative's relationship to the proficiency being measured. The interpretation of the parameters of the multiple-choice model and the use of the model in item analysis are illustrated using data obtained from a pilot test of mathematics achievement items. The use of such item analysis for the detection of flawed items, for item design and development, and for test construction is discussed.  相似文献   

6.
This article demonstrates the utility of restricted item response models for examining item difficulty ordering and slope uniformity for an item set that reflects varying cognitive processes. Twelve sets of paired algebra word problems were developed to systematically reflect various types of cognitive processes required for successful performance. This resulted in a total of 24 items. They reflected distance-rate–time (DRT), interest, and area problems. Hypotheses concerning difficulty ordering and slope uniformity for the items were tested by constraining item difficulty and discrimination parameters in hierarchical item response models. The first set of model comparisons tested the equality of the discrimination and difficulty parameters for each set of paired items. The second set of model comparisons examined slope uniformity within the complex DRT problems. The third set of model comparisons examined whether the familiarity of the story context affected item difficulty for two types of complex DRT problems. The last set of model comparisons tested the hypothesized difficulty ordering of the items.  相似文献   

7.
Using Muraki's (1992) generalized partial credit IRT model, polytomous items (responses to which can be scored as ordered categories) from the 1991 field test of the NAEP Reading Assessment were calibrated simultaneously with multiple-choice and short open-ended items. Expected information of each type of item was computed. On average, four-category polytomous items yielded 2.1 to 3.1 times as much IRT information as dichotomous items. These results provide limited support for the ad hoc rule of weighting k-category polytomous items the same as k - 1 dichotomous items for computing total scores. Polytomous items provided the most information about examinees of moderately high proficiency; the information function peaked at 1.0 to 1.5, and the population distribution mean was 0. When scored dichotomously, information in polytomous items sharply decreased, but they still provided more expected information than did the other response formats. For reference, a derivation of the information function for the generalized partial credit model is included.  相似文献   

8.
The reading test performances of 60 hearing and 60 hearing-impaired children of similar measured reading ages on the Southgate reading test were analysed. As in an earlier study using the Brimer Wide-span test it was shown that the performances of the two groups were quite different. Deaf children tackled significantly more test items than the hearing and made significantly more errors in achieving similar reading scores. A detailed examination of both correct and incorrect answers showed that the deaf children were not simply providing answers to questions at random. Even where they produced incorrect responses they tended, as a group, to select the same answer. Unlike the hearing group, who did not converge on the same incorrect solution to difficult test items, the deaf were systematic in their choices, indicating that they were using a consistent strategy. A post hoc examination of individual test items indicated that the deaf children were selecting answers on the basis of word associations in each test item. On some items these produced a correct response, on others the same (incorrect) response. The implications of these findings are discussed to argue that reading tests based on hearing norms are of little value in the assessment of reading abilities and reading problems in hearing-impaired children.  相似文献   

9.
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.  相似文献   

10.
In four experiments, rats were trained on different patterning discriminations before being tested with compounds composed of novel combinations of the trained stimuli. In Experiment 1, rats were trained on a negative-patterning schedule (A+ B+ AB-) intermixed with reinforced presentations of a second compound (CD+). On a subsequent test, the rats responded more to two novel compounds, AC and BD, than to A and B, but less than to CD. In Experiment 2, rats were trained on two concurrent negative-patterning discriminations (A+ B+ AB-, C+ D+ CD-). On test, they responded more to AC and BD than to AB and CD, but less than to the single stimuli. In Experiment 3, rats were trained on two concurrent positive-patterning discriminations (A-B- AB+, C- D- CD+). On test, their response rates to AC and BD were not different from the response rates to the trained compounds (AB and CD). Finally, in Experiment 4, rats were trained on a positive- and negative-patterning discrimination concurrently. Once again, on test, response rates to AC and BD were not different from responding on reinforced trials of the trained discriminations (A+, B+, and CD+). We discuss the implications of these findings for elemental and configural models of stimulus representation.  相似文献   

11.
The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218‐item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to investigate the NVLA using Rasch models. First, we reduced the number of items using a unidimensional model, which resulted in high levels of test reliability despite decreasing the number of questions, providing the same information about student abilities in less time. Second, the multidimensional analysis indicated that it is possible to view the NVLA as a test with four dimensions, resulting in more detailed information about student abilities. Finally, we combined these approaches to obtain both specificity and brevity, with a four‐dimensional model using 133 items from the original NVLA.  相似文献   

12.
Open–ended counterparts to a set of items from the quantitative section of the Graduate Record Examination (GRE–Q) were developed. Examinees responded to these items by gridding a numerical answer on a machine-readable answer sheet or by typing on a computer. The test section with the special answer sheets was administered at the end of a regular GRE administration. Test forms were spiraled so that random groups received either the grid-in questions or the same questions in a multiple-choice format. In a separate data collection effort, 364 paid volunteers who had recently taken the GRE used a computer keyboard to enter answers to the same set of questions. Despite substantial format differences noted for individual items, total scores for the multiple-choice and open-ended tests demonstrated remarkably similar correlational patterns. There were no significant interactions of test format with either gender or ethnicity.  相似文献   

13.
Abstract

This study was an investigation of items on the Peabody Picture Vocabulary Test (PPVT) to ascertain if verbal responses to items missed indicated that the concept was familiar at the same level of abstraction as the word in the PPVT. One hundred 8-year-old children-25 black boys, 25 black girls, 25 white boys, and 25 white girls-were administered Form A of the PPVT. Eighty-eight children responded verbally to the pictures of the stimulus words missed. Data were analyzed by means of a two-way analysis of variance. A chi square test of significance was used to determine significance level of difference between items for each group. Judges analyzed verbal responses to determine if responses elicited were 1) at the same level of abstraction as the stimulus word, 2) considered to be synonymous to the stimulus word, and 3) indicated the student's understanding of the concept signified by the word. A total of 23 words were identified as being missed disproportionately by one group more than the other. Verbal responses indicated that the concept was familiar for 16 items and unfamiliar for three items. Of the remaining four items, there was indication of differences among the groups.  相似文献   

14.
The major aim of the present study is to assess college students’ attitudes, perceptions, emotional reactions and affective dispositions with respect to various critical dimensions of course achievement testing and assessment, including: “papers” vs. “exams”, “essay” vs. “multiple choice” type formats, “open book” vs. “closed book” exams, “free choice” among items vs. “no free choice” among items, and “oral” vs. “written” modes of test administration. A further aim is to delineate the construction, properties, and potential classroom uses and applications of a selected sample of examinee feedback inventories designed to gauge students’ test attitudes and dispositions. The use of each examinee feedback inventory is demonstrated and exemplified in the context of an empirical study. This paper discusses the assumptions underlying the use of feedback systems in college achievement evaluation; their importance for assessing the face validity of classroom tests; some possible future applications of feedback inventories for research and applied purposes in college; and some guidelines for future research. A mapping sentence specifying the universe of content of test attitude and examinee feedback research is suggested as a heuristic device for guiding future research.  相似文献   

15.
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the existing methods were designed to detect drifts in individual items, which may not be adequate for test characteristic curve–based linking or equating. One example is the item response theory–based true score equating, whose goal is to generate a conversion table to relate number‐correct scores on two forms based on their test characteristic curves. This article introduces a stepwise test characteristic curve method to detect item parameter drift iteratively based on test characteristic curves without needing to set any predetermined critical values. Comparisons are made between the proposed method and two existing methods under the three‐parameter logistic item response model through simulation and real data analysis. Results show that the proposed method produces a small difference in test characteristic curves between administrations, an accurate conversion table, and a good classification of drifted and nondrifted items and at the same time keeps a large amount of linking items.  相似文献   

16.
The objective of this study was to provide empirical evidence to support psychometric properties of a modified four-dimensional model of the Leadership Scale for Sports (LSS). The study tested invariance of all parameters (i.e., factor loadings, error variances, and factor variances–covariances) in the four-dimensional measurement model between two groups of student-athletes. For testing multi-group invariance of the proposed scale, 335 middle school and 320 high school student-athletes in Japan participated in this study. The modified version of the LSS consists of 35 items representing training instruction, democratic behaviour, positive feedback, and social support. A chi-square difference test was employed for model comparisons. The results supported configural, metric, scalar and factor variance–covariance invariance in the modified LSS across the two student-athlete groups.  相似文献   

17.
Unbiased reasoning is considered an essential critical thinking skill that students need to possess to face the future challenges in their work and life. Confirmation bias, which is the tendency to selectively attend to information that is consistent with held beliefs, presents a significant thread to unbiased reasoning. An effective strategy to reduce confirmation bias is the ‘consider-the-opposite’-strategy (COS). The central question of this pre-registered study was whether providing elaborative, worked example feedback after COS practice would lead to a better performance on previously practised and transfer tasks than correct-answer feedback. Participants were 132 university students who took a confirmation bias pre-test, watched an instructional video on COS afterwards and next received either worked example feedback or correct answer feedback on practice tasks, practised only, watched the instruction only or received no treatment. Finally, all participants took a learning test assessing their skill to avoid confirmation bias, and a transfer test assessing whether they could apply this acquired skill to problems containing other biases. Results revealed no differences on the learning test between both feedback conditions, but students who received feedback scored significantly higher on the confirmation bias problems than students who did not receive feedback. We carried out our pre-registered analysis plan, but due to the low reliability of particularly the pre-test, we carried out an additional exploratory analysis on subsets of post-test items and a subset of transfer test items. Results on learning revealed the same pattern as the planned analyses. However, we found no differences between any of the conditions on transfer.  相似文献   

18.
学业水平考试物理试题难度预估方法探究   总被引:1,自引:1,他引:0  
目前上海市普通高中学业水平考试未实行考前试测制度,因此试题难易度主要依据试题编制者的经验进行预估,尚无量化研究的方法。本研究根据国内外研究经验,从试题的物理概念、试题设计、数学运算三个项目出发,结合2011年上海市普通高中物理学业水平考试试题难度实测数据分析,构建试题难度预估的量化方法,并用2012年上海市普通高中物理学业水平考试试题难度实测数据检验其准确性,期望为今后物理试题难易度预估提供研究的基础。  相似文献   

19.
Abstract In this classroom‐based research study, written expression was viewed as an interactive social process involving written communication between the teacher and the children. Children received increased opportunities to write on topics they chose themselves, and their teacher responded in writing to the content of their writing. The teacher did not provide corrective feedback for accuracy of spelling or grammar throughout the study. Written content feedback from the teacher was provided to each child according to an intra‐subject ABAB research design. Analysis of the teacher's written feedback identified her use of six specific categories of positive response to the themes, ideas and characters of each child's writing. Significant increases in both quantity and quality of writing occurred during the written content feedback phases. Spelling accuracy was maintained at a high level of accuracy throughout the study.  相似文献   

20.
Although response times on test items are recorded on a natural scale, the scale for some of the parameters in the lognormal response-time model ( van der Linden, 2006 ) is not fixed. As a result, when the model is used to periodically calibrate new items in a testing program, the parameter are not automatically mapped onto a common scale. Several combinations of linking designs and procedures for the lognormal model are examined that do map parameter estimates onto a common scale. For each of the designs, the standard error of linking is derived. The results are illustrated using examples with simulated data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号