首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
One way to assess the quality of education in post-secondary institutions is through the use of performance indicators. Studies that have compared currently popular process indicators (e.g., library size, percentage of faculty with PhD) found that after controlling for incoming student ability, these process indicators tend to be weakly associated with student outcomes (Pascarella and Terenzini, 2005). In addition, while much research has found that students increase their critical thinking skills as a result of attending college, little is known about what goes on during the college experience that contributes to this. The purpose of this research was to examine the validity of higher-order questions on tests and assignments as a process indicator by comparing it with gains in critical thinking skills among college students as an outcome indicator. The present research consisted of three studies that used different designs, samples, and instruments. Overall, it was found that frequency of higher-order questions can be a valid process indicator as it is related to gains in students’ critical thinking skills.  相似文献   

2.
3.
4.
5.
The purpose of the study was to examine the effect of item phrasing on the validity of a Likert-type attitude scale. Three content similar scales were composed of 15 items, either all positive, all negative, or a mixture of positive and negative items. Five hundred twenty-two students in grades 4–6 responded to one of the three forms. Results from the all positive and negative forms indicated that item means, variances, and factor structures differed significantly. Inspection of item means suggested that it was difficult for the students to indicate agreement by disagreeing with a negative statement. Analyses of the mixed phrasing form indicated factors based upon item phrasing, not item content. Taken together, the results suggest that the technique of balancing item phrasing when used with elementary students appears to affect adversely the validity of attitude measurement.  相似文献   

6.
7.
8.
A student's score on the final examination in a classroom learning situation does not necessarily represent the amount learned during the course. Various measures of gain have been advanced to measure the amount learned, but all have subsequently been found inadequate. It is hypothesized that the relationship between test scores and knowledge is curvilinear. A rationale is presented for the curvilinear nature of the posited relationship and for the fit of the model to classroom learning. From hypothetical data conforming to the model expressed in a mathematical formula, it was shown that it is possible for the final examination to be the best indicant of amount learned, even though individuals are not equal in proficiency at the beginning of the learning task. Based upon several considerations it was concluded that, at present, the best indicant of amount learned in many classroom situations is the final examination.  相似文献   

9.
An empirical investigation of the effect of choice weight scoring on predictive validity and reliability. Choice weight scoring refers to the procedure whereby different weights may be assigned to all the options of an item. Four groups of subjects were included in the experiment. Weights derived from each group were used to score tests for another group in order to assess the cross-validity of the weighted scoring. In no case did the increments in reliability and validity due to the weighted scoring exceed .03.  相似文献   

10.
11.
Different instructional programs were developed for three mathematics aptitude item formats to determine the relative susceptibility of each to special instruction. Subjects were male and female high school junior volunteers in 12 schools. In the seven weeks between a pre- and posttest, experimental Ss received 21 hours of instruction for one of the three formats; control Ss received no special instrucion. Each of the three formats was found susceptible to instruction directed toward it. The complex formats were most susceptible. Female Ss were slightly less able mathematically at the outset and benefited less from instruction than males. Mean gains of nearly a full standard deviation for groups instructed for the complex formats were considered to be of practical consequence.  相似文献   

12.
An urgent need is for some overall unit of educational benefit, and it is here proposed that such be established by certain scaling techniques. Groups of judges are asked to render overall evaluations on “students” of specified characteristics. Their evaluations are studied for correlation with the student traits, and the resultant coefficients, loaded with latent and applied values, become defensible weights to describe other students' overall educational advancement. The general unit so developed, when normalized for the target population, may be termed a “benefit T-score”, or more simply “bentee”. Recursive features of the scaling process, applicable at different levels of generality, should provide for a shift from value-space to test-space, and from societal to expert opinion. Discovered weightings should illuminate differing values of lay and professional groups within society. And the bentee could provide an objective function suitable for optimizing in management-science techniques, now largely neglected in curriculum and administration.  相似文献   

13.
Described are the effects of four sets of instructions on the observed item inter- correlations of current events and subtraction items. The four conditions were: (a) general objective, (b) behavioral objective, (c) behavioral objective plus test item, and (d) behavioral objective plus item-form. Two tests, one in each subject matter, constructed by selecting four items generated from each of the experimental conditions, were administered to 51 seventh grade children. Not found were the expected tendencies toward greater homogeneity among items produced under the three conditions employing behavioral objectives.  相似文献   

14.
It has been argued that item variance and test variance are not necessary characteristics for criterion-referenced tests, although they are necessary for normreferenced tests. This position is in error because it considers sample statistics as the criteria for evaluating items and tests. Within a particular sample, an item or test may have no variance, but in the population of observations for which the test was designed, calibrated, and evaluated, both items and tests must have variance.  相似文献   

15.
汉语的外形特征类量词非常丰富,笔者力图从一些用例范围比较宽泛的词例出发,探求它们的语义特征并从认知的角度分析了量词使用灵活性的原因,丰富的外形特征类量词有其积极作用,也有其消极作用,它们的发展趋势也需要我们予以足够的关注。  相似文献   

16.
This paper assesses the validity of a comparison of mean test scores for two groups of students, and of a longitudinal comparison of means within each group. Using LISREL, confirmatory factor analyses were carried out (a) to test the hypotheses of similar factor patterns, equal units of measurement, and equal accuracy of measurement between the two groups, and (b) to estimate the correlation between the latent traits measured by two successive test administrations in each group. The results indicate (a) that a comparison of the group means may be invalid because, although the factor pattern was the same for both groups, the factors were not measured in the same units, and (b) that longitudinal comparisons within each group are seriously complicated by evidence of structural change.  相似文献   

17.
To assess the concurrent validity of standardized achievement tests using teachers' ratings (and rankings) of pupils' academic achievement as criteria, 42 teachers evaluated each of their students (n = 1,032) in each of five major curricular areas prior to the administration of a battery of standardized achievement tests. The teachers were directed to rate each student's proficiency disregarding attendance, attitude, deportment, and so on. Within-class correlation coefficients were computed to eliminate rater leniency bias. The standardized achievement tests were found to have substantial concurrent validity in reading, math, language arts, science, and social studies. The normalized teacher ranks yielded significantly higher validity coefficients than did the ratings, although the magnitude of the difference was small. The concurrent validity coefficients for language arts, reading, and math were significantly higher than those in science and social studies.  相似文献   

18.
19.
Item discrimination for instruments used to measure characteristics by means of group responses is stressed. It is argued that a percentage of the total sum of squares which is due to groups can appropriately be used as an index of item discrimination.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号