首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
For the purpose of obtaining data to use in test development, multiple matrix sampling (MMS) plans were compared to examinee sampling plans. Data were simulated for examinees, sampled from a population with a normal distribution of ability, responding to items selected from an item universe. Three item universes were considered: one that would produce a normal distribution of test scores, one a moderately platykurtic distribution, and one a very platykurtic distribution. When comparing sampling plans, total numbers of observations were held constant. No differences were found among plans in estimating item difficulty. Examinee sampling produced better estimates of item discrimination, test reliability, and test validity. As total number of observations increased, estimates improved considerably, especially for those MMS plans with larger subtest sizes. Larger numbers of observations were needed for tests designed to produce a normal distribution of test scores. With an adequate number of observations, MMS is seen as an alternative to examinee sampling in test development.  相似文献   

3.
Item sampling and/or multiple matrix sampling techniques have been recommended for a variety of purposes. For some of these purposes, it must be assumed that examinee performance on a set of items is unaffected by the conditions under which the items are taken (i.e., no context effect exists). In this paper factors that may lead to a context effect among high school students are discussed. The net effect of such factors on examinee scores for an English test and a mathematics test is investigated empirically. For the English test there was little support for the existence of a context effect, However, a definite context effect was found for the mathematics test.  相似文献   

4.
The sampling procedures were designed so that the full matrix of item variances and covariances could be estimated. Three subtest sizes were investigated- subtests of size five, nine and sixteen items. In each of these implementations a double cross validation was used yielding two predicted scores for each individual. Discrepancy measures were also computed showing the difference between the observed and the predicted scores. The prediction of individual scores was accomplished within various ranges of error. The correlations between predicted scores and observed scores ranged from the .70′s to the .90′s, depending on the number of predictor variables used. The procedure is applicable in situations in which large numbers of individuals are tested or in situations where multiple measures are taken.  相似文献   

5.
6.
It has been argued that item variance and test variance are not necessary characteristics for criterion-referenced tests, although they are necessary for normreferenced tests. This position is in error because it considers sample statistics as the criteria for evaluating items and tests. Within a particular sample, an item or test may have no variance, but in the population of observations for which the test was designed, calibrated, and evaluated, both items and tests must have variance.  相似文献   

7.
Selected parameters for a negatively skewed and a normally distributed normative distribution were estimated in a post mortem item-examinee sampling investigation. Manipulated systematically were number of subtests, number of items per subtest, and number of examinees responding to each sub-test. Each item-examinee sampling procedure was replicated five times. Defining one observation as the score received by one examinee on one item, the results of this investigation support the conclusion that, in estimating parameters by item-examinee sampling, the variable of importance is not the item-examinee sampling procedure but is instead the number of observations obtained by that procedure. Degree of skewness in the normative distribution and failure to distribute all items among subtests were found to be relatively unimportant variables.  相似文献   

8.
It is a necessary condition that items and tests have variance and discrimination in the range of interest (population of observations) for which they are calibrated and selected. The basis for selection of the calibration sample determines the kind of scale which will be developed, A random sample from a population of individuals leads to a norm-referenced scale, and a sample representative of abilities of a range of a characteristic leads to a criterion-referenced scale.  相似文献   

9.
This note contends that item or score variability is an unnecessary characteristic of criterion-referenced tests as they have been traditionally conceived, namely, as measures of well defined classes of examinee behaviors.  相似文献   

10.
11.
Using a computer-based model of an item trace line, a random sampling experiment concerned with comparing item sample estimates to traditional (examinee) sample estimates of the mean and variance of a distribution of test scores was conducted. The results indicated that the optimal method for estimating a test's parameters may depend on several conditions. As expected, item sampling proved superior to traditional sampling in estimating test means under all conditions. However, with certain test lengths, ranges of item difficulty, and discrimination, traditional sampling provided better estimates of test variance than did item sampling.  相似文献   

12.
An empirical investigation of the effect of choice weight scoring on predictive validity and reliability. Choice weight scoring refers to the procedure whereby different weights may be assigned to all the options of an item. Four groups of subjects were included in the experiment. Weights derived from each group were used to score tests for another group in order to assess the cross-validity of the weighted scoring. In no case did the increments in reliability and validity due to the weighted scoring exceed .03.  相似文献   

13.
14.
15.
本文提供了正项级数比较判别法的一种推广。在此基础上,可以较简便地对余项做出估计,并从而导出其他判别法。  相似文献   

16.
Although many have rejected classical test construction and analysis procedures for criterion-referenced tests, the present study was concerned with the possibility that classical procedures are both applicable and appropriate when samples of both mastery and nonmastery examinees are employed. A rationale for using these samples was presented, and empirical evidence was gathered which supported the practice of combining samples to increase the variance of test scores and thereby permit the proper estimate of reliability and item validities.  相似文献   

17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号