期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Stuart Horne 《European Journal of Psychology of Education - EJPE》1987,2(2):145-156

It is suggested that for criterion-referenced tests to have any educational value, they must be linked to the categories of learning that have been demonstrated in learning theory. These categories form the basis of the test domains. The nature of the two main categories, concepts and rules, is reviewed and it is suggested that the errors produced by pupils that indicate faulty concept learning or rules application should form the basis for the production of tests. Examples of such tests are also discussed. As this approach to testing is markedly different to the current psychometric approach to criterion-referenced testing it is suggested that the form of testing described here be calledconcept-referenced testing to distinguish it from other forms of criterion-referenced measures. 相似文献

2.

CRITERION-REFERENCED TESTING: COMMENTS ON RELIABILITY1

RICHARD J. SHAVELSON JAMES H. BLOCK MICHAEL M. RAVITCH 《Journal of Educational Measurement》1972,9(2):133-137

Currently there is concern among some educators regarding the reliability of criterion-referenced (CR) measures. In this comment, a recent attempt to develop a theory of reliability for CR measures is examined, and some considerations for determining the reliability of CR measures are discussed. Conventional reliability statistics (e.g., coefficient alpha, standard error of measurement) are found appropriate for CR measures satisfying the assumptions of the measurement model underlying classical test theory. For measures with underlying multidimensional traits, conventional reliability statistics may be used at the homogeneous subscale level. When the confidence interval about a student's “below criterion score” includes the criterion, additional evidence about the student should be obtained. Two-stage sequential testing is suggested as one method for acquiring additional evidence. 相似文献

3.

COMMENTS ON “IMPLICATIONS OF CRITERION-REFERENCED MEASUREMENT”1

GEORGE B. SIMON 《Journal of Educational Measurement》1969,6(4):259-260

Criterion-reference as opposed to norm-reference applies to the scores from a test and not to the content or format of a test, hence it is proper to refer to criterion-referenced scores or measures and not to criterion-referenced tests. This concept of criterion-referenced measures is applicable to formative evaluation generally or whenever the objective is mastery of subject matter rather than discrimination among students. 相似文献

4.

TOWARD AN INTEGRATION OF THEORY AND METHOD FOR CRITERION-REFERENCED TESTS1,2

RONALD K. HAMBLETON MELVIN R. NOVICK 《Journal of Educational Measurement》1973,10(3):159-170

In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity estimation consistent with this philosophy are suggested. Also, to improve the decision-making accuracy of criterion-referenced tests, a Bayesian procedure for estimating true mastery scores has been proposed. This Bayesian procedure uses information about other members of a student's group (collateral information), but the resulting estimation is still criterion referenced rather than norm referenced in that the student is compared to a standard rather than to other students. In theory, the Bayesian procedure increases the “effective length” of the test by improving the reliability, the validity, and more importantly, the decision-making accuracy of the criterion-referenced test scores. 相似文献

5.

Estimating parallel form reliability from one administration of a criterion-referenced test: A computer program for practitioners

Robert Saltstone Ken Stange Ted Chase 《Psychology in the schools》1989,26(3):249-253

Reliability of a criterion-referenced test is often viewed as the consistency with which individuals who have taken two strictly parallel forms of a test are classified as being masters or nonmasters. However, in practice, it is rarely possible to retest students, especially with equivalent forms. For this reason, methods for making conservative approximations of alternate form (or test-retest “without the effects of testing”) reliability have been developed. Because these methods are computationally tedious and require some psychometric sophistication, they have rarely been used by teachers and school psychologists. This paper (a) describes one method (Subkoviak's) for estimating alternate-form reliability from one administration of a criterion-referenced test and (b) describes a computer program developed by the authors that will handle tests containing hundreds of items for large numbers of examinees and allow any test user to apply the technique described. The program is a superior alternative to other methods of simplifying this estimation procedure that rely upon tables; a user can check classification consistency estimates for several prospective cut scores directly from a data file, without having to make prior calculations. 相似文献

6.

THE ROLE OF RELIABILITY IN CRITERION-REFERENCED TESTS

MICHAEL T. KANE 《Journal of Educational Measurement》1986,23(3):221-224

In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items. 相似文献

7.

Some problems in interpreting criterion-referenced test results in a program evaluation

Mary Ann B. Barta Unhai R. Ahn Joseph F. Gastright 《Studies in Educational Evaluation》1976,2(3):193-202

The purpose of this paper is to delineate several problems which arise when criterion-referenced test results are used to evaluate the effects of a specific educational treatment. Specifically, the paper deals with: (1) alternative methods of aggregating individual student and group data on objectives, (2) the sensitivity of the instrument to program outcomes, and (3) the comparisons of criterion-referenced test data and standardized (norm-referenced) achievement test data. 相似文献

8.

基于IRT模型的数学学业成就水平测试分析

沈南山《安徽师范大学学报(人文社会科学版)》2012,40(1):67-73

对不同类型学校的774名有效被试实施数学学业成就水平测试,并应用IRT参数模型方法进行分析,得出四点判断:(1)测验分数、最优分数呈负偏态分布;(2)测验信息函数负向偏移,大体呈现双峰波形;(3)主观性试题与逻辑斯蒂模型的拟合性较差;(4)不同类型学校学生的数学学业成就水平存在显著性差异。相似文献

9.

Development and Implementation of a Food Safety Knowledge Instrument 总被引：1，自引：0，他引：1

Carol Byrd-Bredbenner Virginia Wheatley Donald Schaffner Christine Bruhn Lydia Blalock Jaclyn Maurer 《Journal of Food Science Education》2007,6(3):46-55

ABSTRACT: Little is known about the food safety knowledge of young adults. In addition, few knowledge questionnaires and no comprehensive, criterion-referenced measure that assesses the full range of food safety knowledge could be identified. Without appropriate, valid, and reliable measures and baseline data, it is difficult to develop and implement effective education efforts. Thus, the purpose of this study was to develop a comprehensive, valid, reliable food safety knowledge questionnaire. Questionnaire development followed this process: 1) use of published reports and input from experts in food safety and sanitation (n = 7) to identify key food safety concepts; 2) development of a question bank (n = 101) assessing knowledge of key concepts (i.e., cross contamination prevention'disinfection procedures; safe times/temperatures for cooking/storing foods; groups at greatest risk for foodborne disease; foods that increase risk of foodborne disease; and foodborne disease pathogens); 3) refinement of initial questions by experts; 4) questionnaire pretest with young adults (n = 180) and refinement; 5) questionnaire pilot test (n = 126) and refinement; 6) final expert review and refinement; and 7) conversion into an online survey. Young adults (n = 4343, mean age 19.9 ± 1.7SD years) from 21 universities and colleges across the country completed the questionnaire. Item analysis was used to determine the overall quality of the test and identify improvements needed. Livingston's coefficient of reliability for criterion-referenced tests was 0.92. The questionnaire met or exceeded generally recognized standards of reliability and validity. This questionnaire could be useful in baseline assessment of food safety knowledge and measurement of knowledge gained after an educational intervention in adults. 相似文献

10.

二项式模型在标准参照性语言测验长度研究中的应用

柴省三《考试研究》2013,(4):51-59

随着国内外教育测量理念的转变,传统的常模参照测验所提供的相对性评价信息已无法满足考试用户和考生的需求,标准参照测验(CriterionReferenced Test,CRT)的社会价值越来越受到重视。在对被试掌握程度进行分类决策的CRT测验中,如何确定恰当的测验长度和合格分数是影响测验分类误差的重要因素。本文在对CRT测验研究的现状、原理和用途进行考察的基础上,专门介绍了二项式概率模型在CRT测验长度决策研究中的理论和过程,并以误差控制为原则,对二项式模型在综合性标准参照语言测验长度和合格分数决策中的应用过程进行了研究。相似文献

11.

Criterion Referencing and the Meaning of National Curriculum Assessment

Steve Sizmur & Marian Sainsbury 《British Journal of Educational Studies》1997,45(2):123-140

相似文献

12.

A State Perspective on Multiple Measures in School Accountability

William D. Schafer 《Educational Measurement》2003,22(2):27-31

Several meanings of the term multiple measures exist. One of these is the use of assessments from different sources, such as an external test, along with a state-developed test. The use of multiple sources is increasing, especially due to increased federal Title I requirements for state accountability programs and associated increases in the amount and costs of mandated testing. Several issues seem pertinent for states considering combining assessments from internal sources (usually criterion-referenced tests) and external sources (usually norm-referenced tests) into their accountability programs. These are explored from the standpoint of the impact of federally required decision making for schools based on test data. Other possible uses are mentioned briefly. 相似文献

13.

RELIABILITY OF CRITERION-REFERENCED TESTS: A DECISION-THEORETIC FORMULATION

H. SWAMINATHAN RONALD K. HAMBLETON JAMES ALGINA 《Journal of Educational Measurement》1974,11(4):263-267

It has been suggested that the primary purpose for criterion-referenced testing in objective-based instructional programs is to classify examinees into mastery states or categories on the objectives included in the test. We have proposed that the reliability of the criterion-referenced test scores be defined in terms of the consistency of the decision-making process across repeated administrations of the test. Specifically, reliability is defined as a measure of agreement over and above that which can be expected by chance between the decisions made about examinee mastery states in repeated test administrations for each objective measured by the criterion-referenced test. 相似文献

14.

EFFECTS OF DIFFERENT SAMPLES ON ITEM AND TEST CHARACTERISTICS OF CRITERION-REFERENCED TESTS

THOMAS MICHAEL HALADYNA 《Journal of Educational Measurement》1974,11(2):93-99

Although many have rejected classical test construction and analysis procedures for criterion-referenced tests, the present study was concerned with the possibility that classical procedures are both applicable and appropriate when samples of both mastery and nonmastery examinees are employed. A rationale for using these samples was presented, and empirical evidence was gathered which supported the practice of combining samples to increase the variance of test scores and thereby permit the proper estimate of reliability and item validities. 相似文献

15.

Validation of Student, Principal, and Self-Ratings in 360° Feedback^? for Teacher Evaluation

David J. Wilkerson Richard P. Manatt Mary Ann Rogers Ron Maughan 《Journal of Personnel Evaluation in Education》2000,14(2):179-192

相似文献

16.

A Criterion-Referenced Approach to Student Ratings of Instruction

J. Patrick Meyer Justin B. Doromal Xiaoxin Wei Shi Zhu 《Research in higher education》2017,58(5):545-567

We developed a criterion-referenced student rating of instruction (SRI) to facilitate formative assessment of teaching. It involves four dimensions of teaching quality that are grounded in current instructional design principles: Organization and structure, Assessment and feedback, Personal interactions, and Academic rigor. Using item response theory and Wright mapping methods, we describe teaching characteristics at various points along the latent continuum for each scale. These maps enable criterion-referenced score interpretation by making an explicit connection between test performance and the theoretical framework. We explain the way our Wright maps can be used to enhance an instructor’s ability to interpret scores and identify ways to refine teaching. Although our work is aimed at improving score interpretation, a criterion-referenced test is not immune to factors that may bias test scores. The literature on SRIs is filled with research on factors unrelated to teaching that may bias scores. Therefore, we also used multilevel models to evaluate the extent to which student and course characteristic may affect scores and compromise score interpretation. Results indicated that student anger and the interaction between student gender and instructor gender are significant effects that account for a small amount of variance in SRI scores. All things considered, our criterion-referenced approach to SRIs is a viable way to describe teaching quality and help instructors refine pedagogy and facilitate course development. 相似文献

17.

CRITERION-REFERENCED APPLICATIONS OF CLASSICAL TEST THEORY 1,2

SAMUEL A. LIVINGSTON 《Journal of Educational Measurement》1972,9(1):13-26

A reliability coefficient for criterion-referenced tests is developed from the assumptions of classical test theory. This coefficient is based on deviations of scores from the criterion score, rather than from the mean. The coefficient is shown to have several of the important properties of the conventional normreferenced reliability coefficient, including its interpretation as a ratio of variances and as a correlation between parallel forms, its relationship to test length, its estimation from a single form of a test, and its use in correcting for attenuation due to measurement error. Norm-referenced measurement is considered as a special case of criterion-referenced measurement. 相似文献

18.

A classification of the ISIS program using Bloom's cognitive taxonomy

Richard F. Clevenstine 《科学教学研究杂志》1987,24(8):699-712

This article focuses on the practical use of Bloom's Taxonomy of Educational Objectives. The current status of analyzing and classifying test items and behavioral objectives was examined in this study. Specifically, the purpose of this study was to analyze and classify the ISIS minicourse performance objectives and criterion-referenced test items according to Bloom's cognitive Taxonomy in order to determine what levels of cognition the ISIS instructional materials are directed. The performance objectives and test items of thirty-three ISIS minicourses and criterion-referenced tests were collected and classified. Four research questions were posed in the study. The findings indicate that ISIS minicourse test items and performance objectives are written primarily at the Knowledge and Comprehension levels. The ISIS instructional materials reflect low percentages of upper cognitive level test items and performance objectives. Based upon the use of a chi-square analysis, twenty-four of the ISIS minicourses and tests demonstrate a positive congruence between their performance objectives and criterion-referenced test items. Nine ISIS minicourses were found to demonstrate a negative relationship between their performance objectives and test items. Implications and Recommendations based on the findings of the studies are provided. 相似文献

19.

How do teacher education faculty members define desirable teacher beliefs?

《Teaching and Teacher Education》1988,4(3):267-273

Fifty-seven teacher educators described (a) how graduates of their programs should respond to each item in an educational beliefs inventory, and (b) the extent of coverage they provided for each belief in their courses. Desired beliefs were also compared with measures of educational beliefs from 896 entry-level teacher candidates. Major findings include: Although faculty members said most beliefs should be shaped in a particular direction, they often disagreed on the desired direction. Faculty members were more likely to reinforce prevailing beliefs they judged as appropriate than to challenge inappropriate beliefs or to encourage students to develop their own informed positions regarding open-ended educational issues. 相似文献

20.

Differences in personalized learning practice and technology use in high- and low-performing learner-centered schools in the United States

Lee Dabae Huh Yeol Lin Chun-Yi Reigeluth Charles M. Lee Eunbae 《Educational technology research and development : ETR & D》2021,69(2):1221-1245

The Every Student Succeeds Act supports personalized learning (PL) to close achievement gaps of diverse K-12 learners in the United States. Implementing PL into a classroom entails a paradigm change of the educational system. However, it is demanding to transform traditional practice into a personalized one under the pressure of the annual standardized testing while it is unclear which PL approaches are more likely to result in better academic outcomes than others. Using national survey data of ELA teachers in identified learner-centered schools, this study compared high and low-performing learner-centered schools (determined by their standardized test results) in terms of their use of five PL features (personalized learning plan, competency-based student progress, criterion-referenced assessment, project- or problem-based learning, and multi-year mentoring) and their use of technology for the four functions of planning, learning, assessment, and recordkeeping. Generally, teachers in high-performing schools implemented PL more thoroughly and utilized technology for more functions than those in low-performing schools. Teachers in high-performing schools more frequently considered career goals when creating personal learning plans, shared the project outcomes with the community, and assessed non-academic outcomes. They stayed longer with the same students and developed close relationships with more students. Also, they more frequently used technology for sharing resources and reported having a more powerful technology system than those in low-performing schools. This study informs educators, administrators, and researchers of which PL approaches and technology uses are more likely to result in better academic outcomes measured by standardized assessments.

相似文献