期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

THE ISSUE OF ITEM AND TEST VARIANCE FOR CRITERION-REFERENCED TESTS: A CLARIFICATION

JASON MILLMAN W. JAMES POPHAM 《Journal of Educational Measurement》1974,11(2):137-138

This note contends that item or score variability is an unnecessary characteristic of criterion-referenced tests as they have been traditionally conceived, namely, as measures of well defined classes of examinee behaviors. 相似文献

2.

THE ISSUE OF ITEM AND TEST VARIANCE FOR CRITERION-REFERENCED TESTS: A REPLY

M. I. CHAS. E. WOODSON 《Journal of Educational Measurement》1974,11(2):139-140

It is a necessary condition that items and tests have variance and discrimination in the range of interest (population of observations) for which they are calibrated and selected. The basis for selection of the calibration sample determines the kind of scale which will be developed, A random sample from a population of individuals leads to a norm-referenced scale, and a sample representative of abilities of a range of a characteristic leads to a criterion-referenced scale. 相似文献

3.

EFFECTS OF DIFFERENT SAMPLES ON ITEM AND TEST CHARACTERISTICS OF CRITERION-REFERENCED TESTS

THOMAS MICHAEL HALADYNA 《Journal of Educational Measurement》1974,11(2):93-99

Although many have rejected classical test construction and analysis procedures for criterion-referenced tests, the present study was concerned with the possibility that classical procedures are both applicable and appropriate when samples of both mastery and nonmastery examinees are employed. A rationale for using these samples was presented, and empirical evidence was gathered which supported the practice of combining samples to increase the variance of test scores and thereby permit the proper estimate of reliability and item validities. 相似文献

4.

APPLICATION OF ITEM RESPONSE MODELS TO CRITERION-REFERENCED TEST ITEM SELECTION

RONALD K. HAMBLETON DATO N. M. DE GRUIJTER 《Journal of Educational Measurement》1983,20(4):355-367

相似文献

5.

DETERMINING THE LENGTHS FOR CRITERION-REFERENCED TESTS

RONALD K. HAMBLETON CRAIG N. MILLS ROBERT SIMON 《Journal of Educational Measurement》1983,20(1):27-38

相似文献

6.

TOWARD AN INTEGRATION OF THEORY AND METHOD FOR CRITERION-REFERENCED TESTS1,2

RONALD K. HAMBLETON MELVIN R. NOVICK 《Journal of Educational Measurement》1973,10(3):159-170

In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity estimation consistent with this philosophy are suggested. Also, to improve the decision-making accuracy of criterion-referenced tests, a Bayesian procedure for estimating true mastery scores has been proposed. This Bayesian procedure uses information about other members of a student's group (collateral information), but the resulting estimation is still criterion referenced rather than norm referenced in that the student is compared to a standard rather than to other students. In theory, the Bayesian procedure increases the “effective length” of the test by improving the reliability, the validity, and more importantly, the decision-making accuracy of the criterion-referenced test scores. 相似文献

7.

THE ROLE OF RELIABILITY IN CRITERION-REFERENCED TESTS

MICHAEL T. KANE 《Journal of Educational Measurement》1986,23(3):221-224

In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items. 相似文献

8.

AN INTERPRETATION OF LIVINGSTON'S RELIABILITY COEFFICIENT FOR CRITERION-REFERENCED TESTS

CHESTER W. HARRIS 《Journal of Educational Measurement》1972,9(1):27-29

An alternative interpretation of Livingston's reliability coefficient is based on the notion of the relation of the size of the reliability coefficient to the range of talent. It is shown that the (generally) larger Livingston coefficient does not imply a smaller standard error of measurement and consequently does not imply a more dependable determination of whether or not a true score falls below (or exceeds) a given criterion value. 相似文献

9.

ITEM ANALYSIS FOR TEACHER-MADE MASTERY TESTS1

KEVIN D. CREHAN 《Journal of Educational Measurement》1974,11(4):255-261

Various item selection techniques are compared on resultant criterionreferenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. Eighteen volunteer junior and senior high school teachers supplied behavioral objectives and item pools ranging from 26 to 40 items. Each teacher obtained reponses from four classes. Pairs of tests of various length were developed by each item selection method. Estimates of test reliability and validity were obtained using responses independent of the test construction sample. Resultant reliability and validity estimates were compared across item selection techniques. Two of the criterion-referenced item selection methods resulted in consistently higher observed validity. However, the small magnitude of improvement over teacher or random selection raises a question as to whether the benefit warrants the necessary extra effort on the part of the classroom teacher. 相似文献

10.

COMPARISON OF TRADITIONAL AND ITEM RESPONSE THEORY METHODS FOR EQUATING TESTS

MICHAEL J. KOLEN 《Journal of Educational Measurement》1981,18(1):1-11

相似文献

11.

A BAYESIAN DECISION-THEORETIC PROCEDURE FOR USE WITH CRITERION-REFERENCED TESTS1

H. SWAMINATHAN RONALD K. HAMBLETON JAMES ALGINA 《Journal of Educational Measurement》1975,12(2):87-98

相似文献

12.

THE RELATION OF ITEM DISCRIMINATION TO TEST RELIABILITY1

Robert L. Ebel 《Journal of Educational Measurement》1967,4(3):125-128

相似文献

13.

THE RELATIONSHIP BETWEEN VERBAL-MEANING TEST SCORES AND DEGREE OF CONFIDENCE IN ITEM RESPONSES1

SHIH-SUNG WEN 《Journal of Educational Measurement》1975,12(3):197-200

相似文献

14.

TEACHER JUDGMENTS OF TEST ITEM PROPERTIES1

James J. Ryan 《Journal of Educational Measurement》1968,5(4):301-306

相似文献

15.

THE EFFECTS OF MANIPULATED ITEM WRITING CONSTRAINTS ON THE HOMOGENEITY OF TEST ITEMS1

EVA L. BAKER 《Journal of Educational Measurement》1971,8(4):305-309

Described are the effects of four sets of instructions on the observed item inter- correlations of current events and subtraction items. The four conditions were: (a) general objective, (b) behavioral objective, (c) behavioral objective plus test item, and (d) behavioral objective plus item-form. Two tests, one in each subject matter, constructed by selecting four items generated from each of the experimental conditions, were administered to 51 seventh grade children. Not found were the expected tendencies toward greater homogeneity among items produced under the three conditions employing behavioral objectives. 相似文献

16.

A CROSS-VALIDATION STUDY OF THE ITEM ORDERING OF THE PEABODY PICTURE VOCABULARY TEST1

JOSEPH S. RENZULLI DIETER H. PAULUS 《Journal of Educational Measurement》1969,6(1):15-20

A subset of the items of both forms of the Peabody Picture Vocabulary Test (PPVT) was administered to a sample of 452 fourth-, fifth- and sixth-grade students. This sample of students was randomly divided into two equal subgroups. Item difficulty indices were calculated for each of the two subsamples for each of the two forms of the test. Data obtained from the first subsample were used to evaluate the published ordering of items of Forms A and B of the PPVT and to reorder the items according to the empirically derived item difficulties. The second subsample was used as a cross-validation sample to evaluate the empirically derived reordering of items. The results of the cross-validation of the reordering indicate a substantial and significant increase in the validity of the item orderings for this subset of items on both forms of the PPVT. Therefore, this new ordering may yield a more accurate estimate of the intelligence of average and above students in the fourth-, fifth-, and sixth-grades than the present, published ordering of items. 相似文献

17.

A PROCEDURE FOR INVESTIGATING THE UNIDIMENSIONALITY OF ACHIEVEMENT TESTS BASED ON ITEM PARAMETER ESTIMATES

ISAAC I. BEJAR 《Journal of Educational Measurement》1980,17(4):283-296

相似文献

18.

AN EXPERIMENTAL COMPARISON OF ITEM SAMPLING AND EXAMINEE SAMPLING FOR ESTIMATING TEST NORMS

THOMAS R. OWENS DANIEL L. STUFFLEBEAM 《Journal of Educational Measurement》1969,6(2):75-83

An empirical comparison of the accuracy of item sampling and examinee sampling in estimating norm statistics. Item samples were composed of 3, 6, or 12 items selected from a total test of 50 multiple-choice vocabulary questions. Overall, the study findings provided empirical evidence that item sampling is approximately as effective as examinee sampling for estimating the population mean and standard deviation. Contradictory trends occurred for lower ability and higher ability student populations in accuracy of estimated means and standard deviations when the number of items administered increased from 3 to 6 to 12. The findings from this study indicate that the variation of sequences of items occurring in item sampling need not have a significant affect on test performance. 相似文献

19.

THE PLACE AND VALUE OF ITEM BANKING1

R. Wood 《Educational research; a review for teachers and all concerned with progress in education》2013,55(2):114-125

School climate, defined here as the type of mobility system reflected in the school's selection procedures, was shown to interact with ethnic group membership and locus of control (after SES factors were controlled) in affecting student achievement in Israeli schools. Although achievement tended to be highest for all in a competitive and non‐selective environment, the achievement of the socially higher status group was found to be more sensitive to changes in the school atmosphere than that of the lower status group. Students revealing a strong internal locus of control appeared to be less affected by changes in the environment than others.

相似文献

20.

RELIABILITY OF CRITERION-REFERENCED TESTS: A DECISION-THEORETIC FORMULATION

H. SWAMINATHAN RONALD K. HAMBLETON JAMES ALGINA 《Journal of Educational Measurement》1974,11(4):263-267

It has been suggested that the primary purpose for criterion-referenced testing in objective-based instructional programs is to classify examinees into mastery states or categories on the objectives included in the test. We have proposed that the reliability of the criterion-referenced test scores be defined in terms of the consistency of the decision-making process across repeated administrations of the test. Specifically, reliability is defined as a measure of agreement over and above that which can be expected by chance between the decisions made about examinee mastery states in repeated test administrations for each objective measured by the criterion-referenced test. 相似文献