首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 209 毫秒
1.
The effects of training tests on subsequent achievement were studied using 2-test item characteristics: item difficulty and item complexity. Ninety Ss were randomly assigned to treatment conditions having easy or difficult items and calling for rote or complex skills. Each S was administered two training tests during the quarter containing only items defined by his treatment condition. The dependent measure was a sixty item final examination with fifteen items reflecting each of the four treatment condition item types. The results showed greater achievement for those trained with difficult items and with rote items. In addition, two interaction of treatment conditions with type of test items were found. The results are discussed as supporting a hierarchical model rather than a “similarity” transfer model of learning.  相似文献   

2.
《教育实用测度》2013,26(4):289-296
The effect of three item arrangements on state test anxiety was studied using an actual classroom examination administered under power conditions. Examinations were distributed randomly to 128 graduate students in two courses. Separate one-way analyses of variance performed for each course revealed significant effects for item arrangement on anxiety. In one course, anxiety was higher for the hard-to-easy arrangement; in the other course, anxiety was higher for the random arrangement. That highest anxiety levels were associated with different arrangements in the two courses was explained in terms of homogeneity of content and perceived item difficulty. Results suggest that different item arrangements may elicit different levels of anxiety and that item arrangement may introduce a source of variance unrelated to content, thereby reducing the validity of achievement tests.  相似文献   

3.
Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in seven themes. The answer format alternated per theme and was either a labeled image or an answer list, resulting in two versions containing both images and answer lists. Subjects were randomly assigned to one version. Answer formats were compared through item scores. Both examinations had similar overall difficulty and reliability. Two cross‐sectional images resulted in greater item difficulty and item discrimination, compared to an answer list. A schematic image of fetal circulation led to decreased item difficulty and item discrimination. Three images showed variable effects. These results show that effects on assessment scores are dependent on the type of image used. Results from the two cross‐sectional images suggest an extra ability is being tested. Data from a scheme of fetal circulation suggest a cueing effect. Variable effects from other images indicate that a context‐dependent interaction takes place with the content of questions. The conclusion is that item difficulty and item discrimination can be affected when images are used instead of answer lists; thus, the use of images as a response format has potential implications for the validity of test items. Anat Sci Educ © 2012 American Association of Anatomists.  相似文献   

4.
Abstract

To combat problems of cheating arising from testing under crowed classroom conditions, instructors frequently use multiple arrangements of a set of test items. These different arrangements or forms should be nearly equivalent relative to mean total scores. This study reports data from comparisons involving eleven pairs of equivalent tests. There were no significant linear relationships between equivalent test forms on the ordering of item difficulties. Reliabilities differed little within pairs of equivalent tests. Nine of eleven t-tests comparing mean total test scores were insignificant. The bulk of these data supported the assumption that one may construct equivalent power tests by rearranging items, when the ordering of item difficulty is non-systematic on both arrangements.  相似文献   

5.
Effects of Item Wording on Sex Bias   总被引:1,自引:0,他引:1  
This study examined the effects of gender-related item-wording changes on the performance of male and female examinees. Mathematics word problems and English language items were created in neuter, male, and female versions. Items were administered to randomly equivalent samples of about 300 high school juniors and seniors. Loglinear analysis was used to assess the impact of item gender and its interaction with examinee sex on the difficulty and discrimination of each item in each context. No items were found to have sex bias in either context. Mathematics items did not have different difficulty or discrimination in the three gender versions. Neither mathematics nor English items had different discrimination levels in the three gender-related versions. Some English items, however, were found to have different difficulty levels in the three gender-related versions. These difficulty differences were not systematic." none of the three gender versions appeared consistently more or less difficult than the others.  相似文献   

6.
难度不是试题的固有属性,而是考生因素与试题特征之间互动的结果。很多试题分析者倾向于将试题难度偏高的原因仅仅归结于学生未掌握相关知识或技能,而忽视试题本身的特征。通过分析60道难度在0.6以下的高考英语试题,探究其难度来源。结果显示,除考生因素外,难题或偏难题的难度来源也与命题技术有关,比如答案的唯一性与可接受性、考查内容超纲、考点设置与评分标准欠妥等方面的问题。为此,提出考试机构应提高命题水平,加强试题质量监控,确保大规模考试科学选拔人才。  相似文献   

7.
In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored.  相似文献   

8.
Undergraduates (N = 94) enrolled in an educational psychology course read an assigned article of about 3, 700 words. A 30-item multiple-choice test was then administered and followed by one of four treatments: 1.) no feedback, 2.) immediate feedback, 3.) one day or, 4.) seven days delayed feedback. A retention test, consisting of the original items and distractors randomly reordered, was administered seven days after the feedback. No overall differences in performance were observed. Likewise, there were no significant differences for the test items analyzed according to initial performance or according to item difficulty. Questionnaire data indicated that immediate feedback stimulated the most rereading. These results bring into question the importance of controlling feedback intervals carefully in applied instructional settings.  相似文献   

9.
In actual test development practice, the number o f test items that must be developed and pretested is typically greater, and sometimes much greater, than the number that is eventually judged suitable for use in operational test forms. This has proven to be especially true for one item type–analytical reasoning-that currently forms the bulk of the analytical ability measure of the GRE General Test. This study involved coding the content characteristics of some 1,400 GRE analytical reasoning items. These characteristics were correlated with indices of item difficulty and discrimination. Several item characteristics were predictive of the difficulty of analytical reasoning items. Generally, these same variables also predicted item discrimination, but to a lesser degree. The results suggest several content characteristics that could be considered in extending the current specifications for analytical reasoning items. The use of these item features may also contribute to greater efficiency in developing such items. Finally, the influence of these various characteristics also provides a better understanding of the construct validity of the analytical reasoning item type.  相似文献   

10.
11.
为保证语言测试题目的质量和加强题库建设,本文基于经典测试理论,使用Gitest Ⅲ对一份高考试卷(阅读部分)题目进行项目分析,结果显示:该阅读题目的难度、区分度较理想,但难度分布并不理想。建议在使用题库中的组合试卷前先进行试测,以改进试题的难度分布以及部分题目选项的质量,从而提高试题的信度和效度。  相似文献   

12.
Assessment items are commonly field tested prior to operational use to observe statistical item properties such as difficulty. Item parameter estimates from field testing may be used to assign scores via pre-equating or computer adaptive designs. This study examined differences between item difficulty estimates based on field test and operational data and the relationship of such differences to item position changes and student proficiency estimates. Item position effects were observed for 20 assessments, with items in later positions tending to be more difficult. Moreover, field test estimates of item difficulty were biased slightly upward, which may indicate examinee knowledge of which items were being field tested. Nevertheless, errors in field test item difficulty estimates had negligible impacts on student proficiency estimates for most assessments. Caution is still warranted when using field test statistics for scoring, and testing programs should conduct investigations to determine whether the effects on scoring are inconsequential.  相似文献   

13.
Central to the standards-based assessment validation process is an examination of the alignment between state standards and test items. Several alignment analysis systems have emerged recently, but most rely on either traditional rating or matching techniques. Little, if any, analyses have been reported on the degree of consistency between the two methods and on the item and objective characteristics that influence judges' decisions. We randomly assigned judges to either rate item-objective links or match items to objectives while reviewing the 2004 Arizona high school mathematics standards and assessment. Across items we found moderate convergence between methods, and we detected apparent reasons for divergently scored items. We also found that judges relied on item and objective content and intellectual skill features to render decisions. Based on our evidence, we contend that a thorough alignment analysis would involve judges using both rating and matching, while focusing on both content and intellectual skill. The findings have important implications for states when examining the alignment between their standards and assessments.  相似文献   

14.
Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content‐specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. The purpose of this module is to describe and illustrate a template‐based method for generating test items. We outline a three‐step approach where test development specialists first create an item model. An item model is like a mould or rendering that highlights the features in an assessment task that must be manipulated to produce new items. Next, the content used for item generation is identified and structured. Finally, features in the item model are systematically manipulated with computer‐based algorithms to generate new items. Using this template‐based approach, hundreds or even thousands of new items can be generated with a single item model.  相似文献   

15.
Biased test items were intentionally imbedded within a set of test items, and the resulting instrument was administered to large samples of blacks and whites. Three popular item bias detection procedures were then applied to the data: (1) the three-parameter item characteristic curve procedure, (2) the chi-square method, and (3) the transformed item difficulty approach. The three-parameter item characteristic curve procedure proved most effective at detecting the intentionally biased test items; and the chi-square method was viewed as the best alternative. The transformed item difficulty approach has certain limitations yet represents a practical alternative if sample size, lack of computer facilities, or the like preclude the use of the other two procedures.  相似文献   

16.
This study reports the results of a componential analysis of items comprising Sections A and C of Form Z of the reading comprehension portions of the California Achievement Tests (CAT) (Tiegs & Clark, 1963). A set of problem components or attributes characterizing the test items in terms of manifest content, psychologically salient features, and processing demands was developed, including methods for their quantification. The contributions of these components to task difficulty were then evaluated using linear regression methodology. Item difficulty indices were transformations of the familiar proportion-correct item score, obtained from data gathered during the spring of 1989 from 158 deaf examinees. Variation in the item difficulty values was substantially accounted for in terms of a small number of predictor variables (R2 greater than or equal to .90). Implications of the results for construct validity and interpretation of test scores are discussed.  相似文献   

17.
This study supported two hypotheses. First, adjunct questions interacted with a science chart so powerfully that content established as difficult to learn in the pilot and in this study's control groups became easier to learn when charted. Second, students familiar with the chart test before instruction (test exposure) were better prepared to take this test after instruction. This adjunct-question study examined the generalizability of selective-attention and academic-studying hypotheses to a modified science chart medium. About 300 high school students were randomly assigned to four conditions each including a vitamin chart (chart only, test exposure, importance of questions emphasized to students by teachers, and combinational conditions—test exposure and question importance) across 16 biology classrooms. Then these same students were again randomly assigned within each classroom to a control and to four question treatments no questions, questions focusing on easy-to-learn charted content, questions focusing on difficult-to-learn charted content, and a combinational treatment.  相似文献   

18.
Educators disagree on the relative merits of stating classroom objectives behaviorally or nonbehaviorally and have done little to add data to their argument. An experiment was conducted in the field of social science where one of three lists of objectives—one list nonbehavioral, the other two behavioral--was randomly assigned to participating high school social studies teachers who were instructed to teach objectives in their classes. Unit sampling was used and eighteen classrooms were involved. Students were measured, using a form of item sampling, on the acquisition of the five skills stated in the behavioral objectives as well as on eighteen transfer skills. Teachers’ faulty understanding of objectives, indicated by their inability to provide relevant classroom practice and to identify, when asked, test items measuring given objectives, may have accounted for lack of differences.  相似文献   

19.
Incremental rehearsal (IR) is a highly effective intervention that uses high repetition and a high ratio of known to unknown items with linearly spaced known items between the new items. It has been hypothesized that narrowly spaced practice would result in quick learning, whereas items that are widely spaced would result in longer‐term retention. The current study examined the effect of spacing by teaching vocabulary words to 36 fourth‐grade students. Each student was randomly assigned to a widely spaced IR condition (i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, three known items, and an increase in the number of known items presented each time by one) or an IR condition in which spacing increased exponentially (IR‐Exp; i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, four known items, and one unknown item, eight known items). The results indicated that the students in the study retained twice as much information with the widely spaced IR than with the IR‐Exp condition, but the latter required half as much time. IR and IR‐Exp were equally efficient, but IR continues to be superior to all other flashcard approaches in improving retention.  相似文献   

20.
This work examines the hypothesis that the arrangement of items according to increasing difficulty is the real source of what is considered the item-position effect. A confusion of the 2 effects is possible because in achievement measures the items are arranged according to their difficulty. Two item subsets of Raven’s Advanced Progressive Matrices (APM), one following the original item order, and the other one including randomly ordered items, were applied to a sample of 266 students. Confirmatory factor analysis models including representations of both the item-position effect and a possible effect due to increasing item difficulty were compared. The results provided evidence for both effects. Furthermore, they indicated a substantial relation between the item-position effects of the 2 APM subsets, whereas no relation was found for item difficulty. This indicates that the item-position effect stands on its own and is not due to increasing item difficulty.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号