首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A test for mental arithmetic was constructed, consisting of items written in Dutch (the subjects' native language), Spanish, and Roman numerals. A group of 286 subjects received some information on Spanish numerals. The group was randomly split into a Spanish Group and a Roman Group. The Spanish Group received further instruction on Spanish numerals, while the Roman Group got instruction on Roman numerals. Checks on the experimental manipulations showed that the Spanish Group had better knowledge of Spanish numerals than the Roman Group, whereas the Roman Group had better knowledge of Roman numerals. From the total test two subtests were constructed: a 30-item Dutch/Spanish subtest (15 items in Dutch and 15 in Spanish), and a 25-item Dutch/Roman subtest (15 items in Dutch and 10 in Roman). The Dutch items were unbiased between the Spanish and Roman groups, whereas the Spanish items of the Dutch/Spanish subtest were biased against the Roman Group, and the Roman items of the Dutch/Roman subtest were biased against the Spanish Group. The iterative logit method was applied to the two subtests. The method showed very good results in detecting biased items.  相似文献   

2.
Biased test items were intentionally imbedded within a set of test items, and the resulting instrument was administered to large samples of blacks and whites. Three popular item bias detection procedures were then applied to the data: (1) the three-parameter item characteristic curve procedure, (2) the chi-square method, and (3) the transformed item difficulty approach. The three-parameter item characteristic curve procedure proved most effective at detecting the intentionally biased test items; and the chi-square method was viewed as the best alternative. The transformed item difficulty approach has certain limitations yet represents a practical alternative if sample size, lack of computer facilities, or the like preclude the use of the other two procedures.  相似文献   

3.
4.
The purpose of the study was to examine the effect of item phrasing on the validity of a Likert-type attitude scale. Three content similar scales were composed of 15 items, either all positive, all negative, or a mixture of positive and negative items. Five hundred twenty-two students in grades 4–6 responded to one of the three forms. Results from the all positive and negative forms indicated that item means, variances, and factor structures differed significantly. Inspection of item means suggested that it was difficult for the students to indicate agreement by disagreeing with a negative statement. Analyses of the mixed phrasing form indicated factors based upon item phrasing, not item content. Taken together, the results suggest that the technique of balancing item phrasing when used with elementary students appears to affect adversely the validity of attitude measurement.  相似文献   

5.
AN ITERATIVE ITEM BIAS DETECTION METHOD   总被引:1,自引:0,他引:1  
Two strategies for assessing item bias are discussed: methods that compare (transformed) item difficulties unconditional on ability level and methods that compare the probabilities of correct response conditional on ability level. In the present study, the logit model was used to compare the probabilities of correct response to an item by members of two groups, these probabilities being conditional on the observed score. Here the observed score serves as an indicator of ability level. The logit model was iteratively applied: In the Tth iteration, the T items with the highest value of the bias statistic are excluded from the test, and the observed score indicator of ability for the (T + 1)th iteration is computed from the remaining items. This method was applied to simulated data. The results suggest that the iterative logit method is a substantial improvement on the noniterative one, and that the iterative method is very efficient in detecting biased and unbiased items.  相似文献   

6.
Numerous writers have suggested that the discrimination index may be helpful in identifying faulty test items. The purpose of this study was to investigate systematically the validity of the index for this purpose. To attain this objective, two forms of an arithmetic-reasoning test were written. In each form, the items were designed to vary in quality with respect to nine item-writing principles, and on the basis of the responses of 364 examinees, a discrimination index was computed for each item. Next, the items were rated independently for quality by three judges who used a check list of the nine item-writing principles. The average of their ratings for each item was used as the criterion for determining the validity of the indices. The results indicate that the discrimination index is a moderately valid measure of item quality. The implications of this finding are discussed.  相似文献   

7.
8.
9.
10.
Discussed are two problems in the investigation of predictive bias in tests: (a) the effect of unreliability of the predictors, and (b) the effect of excluding a predictor from the regression equation on which there are preexisting group differences.  相似文献   

11.
A set of techniques is presented for constructing a test or test battery which can be inferred to correlate as highly as possible with a hypothetical construct which is named but not measured directly. Use of the techniques requires the test constructor to describe first the nature of the construct indirectly, by estimating the relative sizes of the construct's correlations with several observable variables which the test constructor has selected. Techniques are also described for estimating the validity of a test constructed by these methods.  相似文献   

12.
An empirical investigation of the effect of choice weight scoring on predictive validity and reliability. Choice weight scoring refers to the procedure whereby different weights may be assigned to all the options of an item. Four groups of subjects were included in the experiment. Weights derived from each group were used to score tests for another group in order to assess the cross-validity of the weighted scoring. In no case did the increments in reliability and validity due to the weighted scoring exceed .03.  相似文献   

13.
14.
15.
16.
Different instructional programs were developed for three mathematics aptitude item formats to determine the relative susceptibility of each to special instruction. Subjects were male and female high school junior volunteers in 12 schools. In the seven weeks between a pre- and posttest, experimental Ss received 21 hours of instruction for one of the three formats; control Ss received no special instrucion. Each of the three formats was found susceptible to instruction directed toward it. The complex formats were most susceptible. Female Ss were slightly less able mathematically at the outset and benefited less from instruction than males. Mean gains of nearly a full standard deviation for groups instructed for the complex formats were considered to be of practical consequence.  相似文献   

17.
18.
19.
It has been argued that item variance and test variance are not necessary characteristics for criterion-referenced tests, although they are necessary for normreferenced tests. This position is in error because it considers sample statistics as the criteria for evaluating items and tests. Within a particular sample, an item or test may have no variance, but in the population of observations for which the test was designed, calibrated, and evaluated, both items and tests must have variance.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号