首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A reliability coefficient for criterion-referenced tests is developed from the assumptions of classical test theory. This coefficient is based on deviations of scores from the criterion score, rather than from the mean. The coefficient is shown to have several of the important properties of the conventional normreferenced reliability coefficient, including its interpretation as a ratio of variances and as a correlation between parallel forms, its relationship to test length, its estimation from a single form of a test, and its use in correcting for attenuation due to measurement error. Norm-referenced measurement is considered as a special case of criterion-referenced measurement.  相似文献   

2.
It has been suggested that the primary purpose for criterion-referenced testing in objective-based instructional programs is to classify examinees into mastery states or categories on the objectives included in the test. We have proposed that the reliability of the criterion-referenced test scores be defined in terms of the consistency of the decision-making process across repeated administrations of the test. Specifically, reliability is defined as a measure of agreement over and above that which can be expected by chance between the decisions made about examinee mastery states in repeated test administrations for each objective measured by the criterion-referenced test.  相似文献   

3.
Reliability of a criterion-referenced test is often viewed as the consistency with which individuals who have taken two strictly parallel forms of a test are classified as being masters or nonmasters. However, in practice, it is rarely possible to retest students, especially with equivalent forms. For this reason, methods for making conservative approximations of alternate form (or test-retest “without the effects of testing”) reliability have been developed. Because these methods are computationally tedious and require some psychometric sophistication, they have rarely been used by teachers and school psychologists. This paper (a) describes one method (Subkoviak's) for estimating alternate-form reliability from one administration of a criterion-referenced test and (b) describes a computer program developed by the authors that will handle tests containing hundreds of items for large numbers of examinees and allow any test user to apply the technique described. The program is a superior alternative to other methods of simplifying this estimation procedure that rely upon tables; a user can check classification consistency estimates for several prospective cut scores directly from a data file, without having to make prior calculations.  相似文献   

4.
In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items.  相似文献   

5.
随着国内外教育测量理念的转变,传统的常模参照测验所提供的相对性评价信息已无法满足考试用户和考生的需求,标准参照测验(CriterionReferenced Test,CRT)的社会价值越来越受到重视。在对被试掌握程度进行分类决策的CRT测验中,如何确定恰当的测验长度和合格分数是影响测验分类误差的重要因素。本文在对CRT测验研究的现状、原理和用途进行考察的基础上,专门介绍了二项式概率模型在CRT测验长度决策研究中的理论和过程,并以误差控制为原则,对二项式模型在综合性标准参照语言测验长度和合格分数决策中的应用过程进行了研究。  相似文献   

6.
Criterion-reference as opposed to norm-reference applies to the scores from a test and not to the content or format of a test, hence it is proper to refer to criterion-referenced scores or measures and not to criterion-referenced tests. This concept of criterion-referenced measures is applicable to formative evaluation generally or whenever the objective is mastery of subject matter rather than discrimination among students.  相似文献   

7.
It is suggested that for criterion-referenced tests to have any educational value, they must be linked to the categories of learning that have been demonstrated in learning theory. These categories form the basis of the test domains. The nature of the two main categories, concepts and rules, is reviewed and it is suggested that the errors produced by pupils that indicate faulty concept learning or rules application should form the basis for the production of tests. Examples of such tests are also discussed. As this approach to testing is markedly different to the current psychometric approach to criterion-referenced testing it is suggested that the form of testing described here be calledconcept-referenced testing to distinguish it from other forms of criterion-referenced measures.  相似文献   

8.
《教育实用测度》2013,26(3):181-193
Lewis and Sheehan (1990) developed a computerized sequential mastery test procedure that utilizes Bayesian decision theory to make "master," "nonmaster," or "continue testing" decisions. While maintaining their general framework of administering sequential testlets, a fuzzy set approach was used to develop an alternative computerized mastery test. This new procedure differs from Lewis and Sheehan's in that the decision rule is determined using fuzzy set decision theory, and ability estimates are obtained using the Rasch model rather than a three-parameter logistic model. This article describes this new approach and illustrates the differences between the fuzzy set and Bayesian methods by way of an example.  相似文献   

9.
Norm‐referenced measurement tools — such as reliability, validity, and item analysis — are commonly used to reach and verify conclusions about criteria. Similar tools for criterion‐referenced testing situations are scant. This study examined faculty planning and testing decisions and applied formulas to arrive at numerical indices that serve as analytical tools for use with criterion‐referenced tests. The research documents the effects of applying the concept of platform unity, which has its roots in curriculum alignment theory. Alignment of curriculum occurs if the planned, the delivered, and the tested curricula are congruent. Specifically, platform unity aligns planned, domain‐referenced content with appropriate test types. Mathematical formulas were created to determine numerically if planned and tested content were congruent. In addition, four other constructs were examined. They included effectiveness and efficiency of test‐item type selection and overtesting and undertesting of course content. A chi‐square goodness‐of‐fit test was used to compare faculty planning and testing decisions. Data indicated significant differences (p < .01) between content plans and the test types used to test content. On the basis of the analysis, it was determined that faculty do not plan and test content congruently across three levels of cognitive content. Also, faculty tended to overtest content; they were effective in their selection of test types, but not efficient.  相似文献   

10.
Various item selection techniques are compared on resultant criterionreferenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. Eighteen volunteer junior and senior high school teachers supplied behavioral objectives and item pools ranging from 26 to 40 items. Each teacher obtained reponses from four classes. Pairs of tests of various length were developed by each item selection method. Estimates of test reliability and validity were obtained using responses independent of the test construction sample. Resultant reliability and validity estimates were compared across item selection techniques. Two of the criterion-referenced item selection methods resulted in consistently higher observed validity. However, the small magnitude of improvement over teacher or random selection raises a question as to whether the benefit warrants the necessary extra effort on the part of the classroom teacher.  相似文献   

11.
完形填空试题由于在命题、实施、评卷、结果分析等方面具有客观、便利等优点,因而被广泛应用于外语教学和测试中。但是目前充斥市场的绝大多数完形填空试题效度不高,主要原因就是试题的考点层次不高,效度偏低。根据李筱菊提出的完形填空考点层次理论设计一道完形填空试题,并选择某高校的学生进行试测,重点分析了答题正确率和失分原因,从实证的角度得出通过提高考点层次来提升完形填空试题考点效度的方法。应着重培养学生在高层次考点上的能力,从而提高英语学习者的综合英语水平。  相似文献   

12.
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several estimation methods for different measurement models using simulation techniques. Three types of estimation approach were conceptualized for generalizability theory (GT) and item response theory (IRT): item score approach (ISA), testlet score approach (TSA), and item-nested-testlet approach (INTA). The magnitudes of overestimation when applying item-based methods ranged from 0.02 to 0.06 and were related to the degrees of dependence among within-testlet items. Reliability estimates from TSA were lower than those from INTA due to the loss of information with IRT approaches. However, this could not be applied in GT. Specified methods in IRT produced higher reliability estimates than those in GT using the same approach. Relatively smaller magnitudes of error in reliability estimates were observed for ISA and for methods in IRT. Thus, it seems reasonable to use TSA as well as INTA for both GT and IRT. However, if there is a relatively large dependence among within-testlet items, INTA should be considered for IRT due to nonnegligible loss of information.  相似文献   

13.
This study investigated two procedures for estimating the population standard deviation of nonnormed tests. Two normed tests, both whose population standard deviation was known, were administered to 272 students in grades 3–6. One of the normed tests was treated as a criterion-referenced test; the two variance estimation procedures were applied to the scores from this test. Substantial differences were found between both estimated statistics and the actual standard deviation. The first estimation procedure estimated the standard deviation systematically higher, whereas the second procedure's estimation was systematically lower. These results are discussed in terms of using such procedures for program evaluation.  相似文献   

14.
杨春 《河西学院学报》2004,20(6):103-106
中学英语成绩测试是中学英语教学中的重要环节,对准确评估学生的学习和教师的教学具有重要的意义。实践中,由于教师和教学管理人员在语言测试方面存在的模糊认识,往往使测试的信度和效度得不到应有的保证,从而使测试不能全面反映教学情况。文章从测试的试题命制开始探讨,分析了施考、阅卷、成绩分析四个环节中容易出现的九种问题,提出在新课程标准的背景下,教师要促进自身业务的不断发展,必须重视英语语言测试。  相似文献   

15.
Test reliability is a concept central to classical test theory and it is commonly stated as a requirement that a test attain a certain level of reliability before it be considered of sufficient quality for practical use. This article discusses the role of reliability in item response theory, and in particular the role of reliability in contexts where matrix sampling designs are used and concern is with the estimation of population parameters rather than the measurement of individuals. The concept of a measurement design effect is introduced. This concept parallels the concept of sampling design effects, in that it describes the impact of measurement error at the individual level (described through a reliability index) on the accuracy with which population parameters are estimated.  相似文献   

16.
This article treats various procedures for examining the reliability of group mean difference scores, with particular emphasis on procedures from univariate and multivariate generalizability theory. Attention is given to both traditional norm-referenced perspectives on reliability as well as criterion-referenced perspectives that focus on error-tolerance ratios and functions of them. The procedures discussed are illustrated using three cohorts of data for third- and fourth-grade students in Iowa who took the Iowa Tests of Basic Skills in recent years. For these data, estimates of reliability for norm-referenced decisions tend to be relatively low. By contrast, for criterion-referenced decisions, estimates of reliability-like coefficients based on error-tolerance ratios tend to be noticeably larger.  相似文献   

17.
A misconception exists that validity may refer only to the interpretation of test scores and not to the uses of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test scores remains essential today. However, test scores are not interpreted and then ignored; rather, their interpretations lead to actions. Thus, a modern definition of validity needs to describe the validation of test score interpretations as a necessary, but insufficient, step en route to validating the uses of test scores for their intended purposes. To ignore test use in defining validity is tantamount to defining validity for ‘useless’ tests. The current definition of validity stipulated in the 2014 version of the Standards for Educational and Psychological Testing properly describes validity in terms of both interpretations and uses, and provides a sufficient starting point for validation.  相似文献   

18.
The tension between criterion-referenced and norm-referenced assessment is examined in the context of curriculum planning and assessment in outcomes-based approaches to higher education. This paper argues the importance of a criterion-referenced assessment approach once an outcomes-based approach has been adopted. It further discusses the implementation of criterion-referenced assessment, considering to what extent the criteria and standards adopted are implicitly norm referenced. It introduces a compatible interpretation of criterion-referenced and norm-referenced assessments in higher education, and illustrates how their combined use can avoid grade inflation and also provide useful information to educators, employers and learners. Instead of seeing criterion referencing and norm referencing as a dichotomy, assessment in higher education benefits from their synthesis through a feedback loop that emphasises alignment between learning and assessment; such feedback and alignment are essential features of quality assurance and enhancement.  相似文献   

19.
Currently there is concern among some educators regarding the reliability of criterion-referenced (CR) measures. In this comment, a recent attempt to develop a theory of reliability for CR measures is examined, and some considerations for determining the reliability of CR measures are discussed. Conventional reliability statistics (e.g., coefficient alpha, standard error of measurement) are found appropriate for CR measures satisfying the assumptions of the measurement model underlying classical test theory. For measures with underlying multidimensional traits, conventional reliability statistics may be used at the homogeneous subscale level. When the confidence interval about a student's “below criterion score” includes the criterion, additional evidence about the student should be obtained. Two-stage sequential testing is suggested as one method for acquiring additional evidence.  相似文献   

20.
Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores, and scores based on multidimensional item response theory (IRT) models were compared and found to improve the precision of the subscale scores. However, the augmented subscale scores were found to be more highly correlated and less variable than unaugmented scores. The meaningfulness of reporting such augmented scores as well as the implications for validity and test development are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号