首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A subset of the items of both forms of the Peabody Picture Vocabulary Test (PPVT) was administered to a sample of 452 fourth-, fifth- and sixth-grade students. This sample of students was randomly divided into two equal subgroups. Item difficulty indices were calculated for each of the two subsamples for each of the two forms of the test. Data obtained from the first subsample were used to evaluate the published ordering of items of Forms A and B of the PPVT and to reorder the items according to the empirically derived item difficulties. The second subsample was used as a cross-validation sample to evaluate the empirically derived reordering of items. The results of the cross-validation of the reordering indicate a substantial and significant increase in the validity of the item orderings for this subset of items on both forms of the PPVT. Therefore, this new ordering may yield a more accurate estimate of the intelligence of average and above students in the fourth-, fifth-, and sixth-grades than the present, published ordering of items.  相似文献   

2.
Incremental rehearsal (IR) is a highly effective intervention that uses high repetition and a high ratio of known to unknown items with linearly spaced known items between the new items. It has been hypothesized that narrowly spaced practice would result in quick learning, whereas items that are widely spaced would result in longer‐term retention. The current study examined the effect of spacing by teaching vocabulary words to 36 fourth‐grade students. Each student was randomly assigned to a widely spaced IR condition (i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, three known items, and an increase in the number of known items presented each time by one) or an IR condition in which spacing increased exponentially (IR‐Exp; i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, four known items, and one unknown item, eight known items). The results indicated that the students in the study retained twice as much information with the widely spaced IR than with the IR‐Exp condition, but the latter required half as much time. IR and IR‐Exp were equally efficient, but IR continues to be superior to all other flashcard approaches in improving retention.  相似文献   

3.
One assumption common to all models for determining the optimal number of options per item (e. g., Lord, 1977) is that total testing time is proportional to the number of items and the number of options per item. Therefore, under this assumption given a fixed testing time, the test can be shortened or lengthened by deleting or adding a proportional number of options. The present study examines the validity of this assumption in three tests which were administered with 2, 3, 4, and 5 options per item. The number of items attempted in the first 10 and 15 minutes of the testing session and the time needed to complete the tests were recorded. Thus, the rate of performance for both fixed time and fixed test length was analyzed. A strong and consistently negative relationship between rate of performance and the number of options was detected in all tests. Thus, the empirical results did not support the assumption of proportionality. Furthermore, the data indicated that the method by which options are deleted can play a role in this context. A more realistic assumption of generalized proportionality, proposed by Grier (1976), was supported by the results from a Mathematical Reasoning test, but was only partially supported for a Vocabulary and a Reading Comprehension test.  相似文献   

4.
Numerous writers have suggested that the discrimination index may be helpful in identifying faulty test items. The purpose of this study was to investigate systematically the validity of the index for this purpose. To attain this objective, two forms of an arithmetic-reasoning test were written. In each form, the items were designed to vary in quality with respect to nine item-writing principles, and on the basis of the responses of 364 examinees, a discrimination index was computed for each item. Next, the items were rated independently for quality by three judges who used a check list of the nine item-writing principles. The average of their ratings for each item was used as the criterion for determining the validity of the indices. The results indicate that the discrimination index is a moderately valid measure of item quality. The implications of this finding are discussed.  相似文献   

5.
编制了高职学生学习动力量表。对编制的量表项目进行调查,并对调查数据进行主成分极大方差旋转法因子分析,提取影响高职学生学习动力的7个因子,包括积极精神因子、积极状态因子、生存动机因子、学校管理因子、学习能力因子、情感因子和消极因子。该量表的同质性信度α系数为0.8310,具有较好的内部一致性信度;各项效度指标均表明该量表具有较理想的内容效度、效标关联效度和结构效度,达到了心理测量学要求的水平。  相似文献   

6.
Senior high school mathematics students were taught computer arithmetic via self-instructional materials. Following instruction they were randomly assigned to one of two groups. One group was tested with a norm-referenced measure made up of items having moderate difficulty and high correlations with each other; the other group was tested with a criterion-referenced measure designed to assess attainment of specific behavioral objectives. Student attitude toward the content of instruction and toward the mode of instruction was assessed immediately following. Significantly more positive attitude toward the subject matter of instruction was associated with the use of the criterion-referenced measure. Differences in attitude toward mode of instruction were not significant.  相似文献   

7.
《教育实用测度》2013,26(2):175-199
This study used three different differential item functioning (DIF) detection proce- dures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify factors (e.g., content, cognitive processes, differences in ability distributions, etc.) that may be related to DIF. The QUASAR (Quantitative Under- standing: Amplifying Student Achievement and Reasoning) Cognitive Assessment Instrument (QCAI) is designed to measure students' mathematical thinking and reasoning skills and consists of open-ended items that require students to show their solution processes and provide explanations for their answers. In this study, 33 polytomously scored items, which were distributed within four test forms, were evaluated with respect to gender-related DIF. The data source was sixth- and seventh- grade student responses to each of the four test forms administrated in the spring of 1992 at all six school sites participatingin the QUASARproject. The sample consisted of 1,782 students with approximately equal numbers of female and male students. The results indicated that DIF may not be serious for 3 1 of the 33 items (94%) in the QCAI. For the two items that were detected as functioning differently for male and female students, several plausible factors for DIF were discussed. The results from the secondary analyses, which removed the mutual influence of the two items, indicated that DIF in one item, PPPl, which favored female students rather than their matched male students, was of particular concern. These secondary analyses suggest that the detection of DIF in the other item in the original analysis may have been due to the influence of Item PPPl because they were both in the same test form.  相似文献   

8.
Two experiments indicated that two approaches to serial learning are too extreme—the classical view that it consists only of interitem associations and various recent views that it involves no interitem associations. The novel assumption introduced here was that phrasing cues, normally conceptualized as merely segregating long series into smaller units or chunks, may also enter into associations with items, thereby reducing interitem interference and facilitating serial learning. It was found that one item could become a signal for another item, an interitem association, or be overshadowed by a phrasing cue, such as a brightness and temporal cue, also signaling that item. The items were .045-g pellets. Rats traversed a runway for items arranged in ordered series, 14-7-3-1-0 pellets (Experiment 1) or 10-2-0-10 (Experiment 2). Complete tracking of, for example, the 10-2-0-10 series would consist of fastest running to 10 pellets and slowest running to 0 pellets. In both investigations, the interitem association overshadowed was that between 0 pellets and the subsequent rewarded item, 0 → 14 (Experiment 1) or 0 → 10 (Experiment 2). Either repetitions of the 14-7-3-1-0 subpattern (Experiment 1) or merely the terminal 10-pellet item (Experiment 2) were phrased, both methods producing identical results. Overshadowing the 0-pellet item produced superior serial learning, more rapid extinction, and, in Experiment 1, considerable elevation of responding when the brightness phrasing cue was introduced in extinction, an effect said to be conceptually identical to spontaneous recovery and one demonstrating directly that phrasing cues are in reality overshadowing cues. It was suggested that many effects attributed to forgetting may be due to unrecognized overshadowing of memory cues by phrasing cues, giving rise to exaggerated estimates of forgetting.  相似文献   

9.
《教育实用测度》2013,26(1):89-97
Research on the use of multiple-choice tests has presented conflicting evidence about the use of statistical item difficulty as a means of ordering items. An alternate method advocated by many texts is the use of cognitive difficulty. This study examined the effect of using both statistical and cognitive item difficulty in determining item order. Results indicated that those students who received items in an increasing cognitive order, no matter what the order of statistical difficulty, scored higher on hard items. Those students who received the forms with opposing cognitive and statistical difficulty orders scored the highest on medium-level items. The study concludes with a call for more research on the effects of cognitive difficulty and suggests that future studies examine subscores as well as total test results.  相似文献   

10.
This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social interaction skills and global perspective). The Rasch modelling approach was adopted for instrument development and validation. A total of 425 items were developed. The content validity of these items was examined via six focus group interviews with target students, and the construct validity was verified against data collected from a large student sample (N?=?1151). A matrix design was adopted to assemble the items in 26 test forms, which were distributed at random in each administration session. The results demonstrated that the item bank had high reliability and good construct validity. Cross-sectional comparisons of Years 1–4 students revealed patterns of changes over the years. Correlation analyses shed light on the relationships between the constructs. Implications are drawn to inform future efforts to develop the instrument, and suggestions are made regarding ways to use the instrument to enhance the teaching and learning of generic skills.  相似文献   

11.
In this survey we sought to investigate the extent to which primary school teachers working in Adelaide's northern suburbs (mainly lower SES) would relate to direct instruction as a viable teaching method in their professional work. Through approaches in school staffrooms, 150 questionnaires were distributed and 58 of these were returned via mail. A Likert-scale was used with five positive and six negative items, and a single factor resolution was evident. It was possible to identify 11 (19%) respondents exhibiting varying degrees of negative attitude, and 47 (81%) exhibiting varying degrees of positive attitude. Attitudes to direct instruction correlated positively with teachers' years of experience (r=0.34), and with a checklist measure tapping actual knowledge of the components of direct instruction as described by Rosenshine (r=0.63). Female teachers reported more positive attitudes than male teachers. Item analysis indicated a consistent pattern of generally positive orientation towards direct instruction, except in the case of one item, “Direct instruction is an effective method with all students,” which elicited an agreement level of only 39%.  相似文献   

12.
The definition of what it means to take a test online continues to evolve with the inclusion of a broader range of item types and a wide array of devices used by students to access test content. To assure the validity and reliability of test scores for all students, device comparability research should be conducted to evaluate the impact of testing device on student test performance. The current study looked at the comparability of test scores across tablets and computers for high school students in three commonly assessed content areas and for a variety of different item types. Results indicate no statistically significant differences across device type for any content area or item type. Student survey results suggest that students may have a preference for taking tests on devices with which they have more experience, but that even limited exposure to tablets in this study increased positive responses for testing on tablets.  相似文献   

13.
The use of content validity as the primary assurance of the measurement accuracy for science assessment examinations is questioned. An alternative accuracy measure, item validity, is proposed. Item validity is based on research using qualitative comparisons between (a) student answers to objective items on the examination, (b) clinical interviews with examinees designed to ascertain their knowledge and understanding of the objective examination items, and (c) student answers to essay examination items prepared as an equivalent to the objective examination items. Calculations of item validity are used to show that selected objective items from the science assessment examination overestimated the actual student understanding of science content. Overestimation occurs when a student correctly answers an examination item, but for a reason other than that needed for an understanding of the content in question. There was little evidence that students incorrectly answered the items studied for the wrong reason, resulting in underestimation of the students' knowledge. The equivalent essay items were found to limit the amount of mismeasurement of the students' knowledge. Specific examples are cited and general suggestions are made on how to improve the measurement accuracy of objective examinations.  相似文献   

14.
The Trends in International Mathematics and Science Study (TIMSS) is a comparative assessment of the achievement of students in many countries. In the present study, a rigorous independent evaluation was conducted of a representative sample of TIMSS science test items because item quality influences the validity of the scores used to inform educational policy in those countries. The items had been administered internationally to 16,009 students in their eighth year of formal schooling. The evaluation had three components. First, the Rasch model, which emphasizes high quality items, was used to evaluate the items psychometrically. Second, readability and vocabulary analyses were used to evaluate the wording of the items to ensure they were comprehensible to the students. And third, item development guidelines were used by a focus group of science teachers to evaluate the items in light of the TIMSS assessment framework, which specified the format, content, and cognitive domains of the items. The evaluation components indicated that the majority of the items were of high quality, thereby contributing to the validity of TIMSS scores. These items had good psychometric characteristics, readability, vocabulary, and compliance with the assessment framework. Overall, the items tended to be difficult: constructed response items assessing reasoning or application were the most difficult, and multiple choice items assessing knowledge or application were less difficult. The teachers revised some of the sampled items to improve their clarity of content, conciseness of wording, and fit with format specifications. For TIMSS, the findings imply that some of the non‐sampled items may need revision, too. For researchers and teachers, the findings imply that the TIMSS science items and the Rasch model are valuable resources for assessing the achievement of students. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 1321–1344, 2012  相似文献   

15.
16.
Three Wechsler scales (the Wechsler Adult Intelligence Scale, Wechsler Intelligence Scale for Children, and Wechsler-Bellevue II) were administered in a counterbalanced design to 72 randomly selected 17 year-old high school Ss in order to investigate their comparability by testing the equality of ( a ) means, ( b ) variances, ( c ) reliability coefficients, and ( d ) validity coefficients based on scaled scores and IQs. Results indicated that the subtest scores and IQs for the given three scales were not equivalent. The present findings conform with most of the previous results regarding the comparability of Wechsler scales. Although the three scales investigated all evidence high similarity of item content and format, they clearly fail to meet the statistical criteria of equivalence for 17 year-old subjects.  相似文献   

17.
The purpose of this study was to develop a Meta‐Affective Trait Scale (MATS) to measure the meta‐affective inclinations related to emotions that students have while they are studying for their classes. First, a pilot study was performed with 380 10th‐grade students. Results of the exploratory factor analysis supported a two‐factor structure of the MATS, with 17 items and two dimensions (affective awareness and affective regulation). Second, in the validation study, the confirmatory factor analysis was carried out using data from 359 11th‐grade students. Satisfactory fit indices were obtained, providing evidence for the reliability and validity of the scale. Finally, for further evidence, a correlational analysis was run. Results indicated positive and significant correlations between learning strategies and self‐efficacy and the dimensions of the MATS. Consequently, the MATS can be employed by both researchers and teachers to assess students’ meta‐affective inclinations.  相似文献   

18.
Two problems in test development relate to the use of illustrations: (1) Do illustrated items perform better than written items, and (2) Does item performance vary as a function of the type and size of the illustration? A sample of 63 tests was drawn from all the Air Force Specialty Knowledge Tests containing illustrations. These 63 tests had been administered to approximately 28,261 airmen under operational conditions. Item statistics between illustrated and written items drawn from the same content areas were compared using F ratios. The results indicated: (1) That illustrated items in general performed slightly better than matched written items; (2) That the best-performing category of illustrated items was tables.  相似文献   

19.
In actual test development practice, the number o f test items that must be developed and pretested is typically greater, and sometimes much greater, than the number that is eventually judged suitable for use in operational test forms. This has proven to be especially true for one item type–analytical reasoning-that currently forms the bulk of the analytical ability measure of the GRE General Test. This study involved coding the content characteristics of some 1,400 GRE analytical reasoning items. These characteristics were correlated with indices of item difficulty and discrimination. Several item characteristics were predictive of the difficulty of analytical reasoning items. Generally, these same variables also predicted item discrimination, but to a lesser degree. The results suggest several content characteristics that could be considered in extending the current specifications for analytical reasoning items. The use of these item features may also contribute to greater efficiency in developing such items. Finally, the influence of these various characteristics also provides a better understanding of the construct validity of the analytical reasoning item type.  相似文献   

20.
This study established a Chinese scale for measuring high school students’ ocean literacy. This included testing its reliability, validity, and differential item functioning (DIF) with the aim of compensating for the lack of DIF tests focusing on current scales. The construct validity and reliability were verified and tested by analyzing the established scale’s items using the Rasch model, and a gender DIF test was conducted to ensure the test results’ fairness when distinct groups were compared simultaneously. The results indicated that the scale established in this study is unidimensional and possesses favorable internal consistency and construct validity. The gender DIF test results indicated that several items were difficult for either female or male students to correctly answer; however, the experts and scholars discussed these items individually and suggested retaining them. The final Chinese version of the ocean literacy scale developed here comprises 48 items that can reflect high school students’ understanding of ocean literacy—which helps students understand the topics of marine science encountered in real life.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号