首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Research on equating with small samples has shown that methods with stronger assumptions and fewer statistical estimates can lead to decreased error in the estimated equating function. This article introduces a new approach to linear observed‐score equating, one which provides flexible control over how form difficulty is assumed versus estimated to change across the score scale. A general linear method is presented as an extension of traditional linear methods. The general method is then compared to other linear and nonlinear methods in terms of accuracy in estimating a criterion equating function. Results from two parametric bootstrapping studies based on real data demonstrate the usefulness of the general linear method.  相似文献   

2.
This study investigated the effectiveness of equating with very small samples using the random groups design. Of particular interest was equating accuracy at specific scores where performance standards might be set. Two sets of simulations were carried out, one in which the two forms were identical and one in which they differed by a tenth of a standard deviation in overall difficulty. These forms were equated using mean equating, linear equating, unsmoothed equipercentile equating, and equipercentile equating using two through six moments of log-linear presmoothing with samples of 25, 50, 75, 100, 150, and 200. The results indicated that identity equating was preferable to any equating method when samples were as small as 25. For samples of 50 and above, the choice of an equating method over identity equating depended on the location of the passing score relative to examinee performance. If passing scores were located below the mean, where data were sparser, mean equating produced the smallest percentage of misclassified examinees. For passing scores near the mean, all methods produced similar results with linear equating being the most accurate. For passing scores above the mean, equipercentile equating with 2- and 3-moment presmoothing were the best equating methods. Higher levels of presmoothing did not improve the results.  相似文献   

3.
Two methods of local linear observed‐score equating for use with anchor‐test and single‐group designs are introduced. In an empirical study, the two methods were compared with the current traditional linear methods for observed‐score equating. As a criterion, the bias in the equated scores relative to true equating based on Lord's (1980) definition of equity was used. The local method for the anchor‐test design yielded minimum bias, even for considerable variation of the relative difficulties of the two test forms and the length of the anchor test. Among the traditional methods, the method of chain equating performed best. The local method for single‐group designs yielded equated scores with bias comparable to the traditional methods. This method, however, appears to be of theoretical interest because it forces us to rethink the relationship between score equating and regression.  相似文献   

4.
This article suggests a method for estimating a test-score equating relationship from small samples of test takers. The method does not require the estimated equating transformation to be linear. Instead, it constrains the estimated equating curve to pass through two pre-specified end points and a middle point determined from the data. In a resampling study with two test forms that differed substantially in difficulty, the proposed method compared favorably with other equating methods, especially for equating scores below the 10th percentile and above the 90th percentile.  相似文献   

5.
In this article, linear item response theory (IRT) observed‐score equating is compared under a generalized kernel equating framework with Levine observed‐score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when using data from IRT models, linear IRT observed‐score equating is virtually identical to Levine observed‐score equating. This leads to the conclusion that poststratification equating based on true anchor scores can be viewed as the curvilinear Levine observed‐score equating.  相似文献   

6.
This article describes a preliminary investigation of an empirical Bayes (EB) procedure for using collateral information to improve equating of scores on test forms taken by small numbers of examinees. Resampling studies were done on two different forms of the same test. In each study, EB and non-EB versions of two equating methods—chained linear and chained mean—were applied to repeated small samples drawn from a large data set collected for a common-item equating. The criterion equating was the chained linear equating in the large data set. Equatings of other forms of the same test provided the collateral information. New-form sample size was varied from 10 to 200; reference-form sample size was constant at 200. One of the two new forms did not differ greatly in difficulty from its reference form, as was the case for the equatings used as collateral information. For this form, the EB procedure improved the accuracy of equating with new-form samples of 50 or fewer. The other new form was much more difficult than its reference form; for this form, the EB procedure made the equating less accurate.  相似文献   

7.
This paper examined observed score linear equating in two different data collection designs, the equivalent groups design and the nonequivalent groups design, when information from covariates (i.e., background variables correlated with the test scores) was included. The main purpose of the study was to examine the effect (i.e., bias, variance, and mean squared error) on the estimators of including this additional information. A model for observed score linear equating with covariates first was suggested. As a second step, the model was used in a simulation study to show that the use of covariates such as gender and education can increase the accuracy of an equating by reducing the mean squared error of the estimators. Finally, data from two administrations of the Swedish Scholastic Assessment Test were used to illustrate the use of the model.  相似文献   

8.
In equating, when common items are internal and scoring is conducted in terms of the number of correct items, some pairs of total scores (X) and common‐item scores (V) can never be observed in a bivariate distribution of X and V; these pairs are called structural zeros. This simulation study examines how equating results compare for different approaches to handling structural zeros. The study considers four approaches: the no‐smoothing, unique‐common, total‐common, and adjusted total‐common approaches. This study led to four main findings: (1) the total‐common approach generally had the worst results; (2) for relatively small effect sizes, the unique‐common approach generally had the smallest overall error; (3) for relatively large effect sizes, the adjusted total‐common approach generally had the smallest overall error; and, (4) if sole interest focuses on reducing bias only, the adjusted total‐common approach was generally preferable. These results suggest that, when common items are internal and log‐linear bivariate presmoothing is performed, structural zeros should be maintained, even if there is some loss in the moment preservation property.  相似文献   

9.
Five methods for equating in a random groups design were investigated in a series of resampling studies with samples of 400, 200, 100, and 50 test takers. Six operational test forms, each taken by 9,000 or more test takers, were used as item pools to construct pairs of forms to be equated. The criterion equating was the direct equipercentile equating in the group of all test takers. Equating accuracy was indicated by the root-mean-squared deviation, over 1,000 replications, of the sample equatings from the criterion equating. The methods investigated were equipercentile equating of smoothed distributions, linear equating, mean equating, symmetric circle-arc equating, and simplified circle-arc equating. The circle-arc methods produced the most accurate results for all sample sizes investigated, particularly in the upper half of the score distribution. The difference in equating accuracy between the two circle-arc methods was negligible.  相似文献   

10.
Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item‐fit statistics for correct and misspecified diagnostic classification models within a log‐linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3 vs. 5), and items (25 vs. 50) as well as different attribute correlations (.50 vs. .80) and marginal attribute difficulties (equal vs. different). We investigated misspecifications of interaction effect parameters under correct Q‐matrix specification and two types of Q‐matrix misspecification. While the misspecification of interaction effects had little impact on classification accuracy, invalid Q‐matrix specifications led to notably decreased classification accuracy. Two proposed item‐fit indexes were more strongly sensitive to overspecification of Q‐matrix entries for items than to underspecification. Information‐based fit indexes AIC and BIC were sensitive to both over‐ and underspecification.  相似文献   

11.
In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed‐score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the normality assumption, would be preferred, because it is asymptotically accurate regardless of the distribution of the data. In this article, an analytical formula for the standard error of linear observed‐score equating, which characterizes the effect of nonnormality, is obtained under elliptical distributions. Using three large‐scale real data sets as the populations, resampling studies are conducted to empirically evaluate the normal and general estimators of the standard error of linear observed‐score equating. The effect of sample size (50, 100, 250, or 500) and equating method (chained linear, Tucker, or Levine observed‐score equating) are examined. Results suggest that the general estimator has smaller bias than the normal estimator in all 36 conditions; it has larger standard error when the sample size is at least 100; and it has smaller root mean squared error in all but one condition. An R program is also provided to facilitate the use of the general estimator.  相似文献   

12.
This article is a response to the commentaries on the position paper on observed‐score equating by van der Linden (this issue). The response focuses on the more general issues in these commentaries, such as the nature of the observed scores that are equated, the importance of test‐theory assumptions in equating, the necessity to use multiple equating transformations, and the choice of conditioning variables in equating.  相似文献   

13.
This study investigated differences between two approaches to chained equipercentile (CE) equating (one‐ and bi‐direction CE equating) in nearly equal groups and relatively unequal groups. In one‐direction CE equating, the new form is linked to the anchor in one sample of examinees and the anchor is linked to the reference form in the other sample. In bi‐direction CE equating, the anchor is linked to the new form in one sample of examinees and to the reference form in the other sample. The two approaches were evaluated in comparison to a criterion equating function (i.e., equivalent groups equating) using indexes such as root expected squared difference, bias, standard error of equating, root mean squared error, and number of gaps and bumps. The overall results across the equating situations suggested that the two CE equating approaches produced very similar results, whereas the bi‐direction results were slightly less erratic, smoother (i.e., fewer gaps and bumps), usually closer to the criterion function, and also less variable.  相似文献   

14.
The selection of bandwidth in kernel equating is important because it has a direct impact on the equated test scores. The aim of this article is to examine the use of double smoothing when selecting bandwidths in kernel equating and to compare double smoothing with the commonly used penalty method. This comparison was made using both an equivalent groups design and a nonequivalent group with anchor test design. The performance of the methods was evaluated through simulation studies using both symmetric and skewed score distributions. In addition, the bandwidth selection methods were applied to real data from a college admissions test. The results show that the traditional penalty method works well although double smoothing is a viable alternative because it performs reasonably well compared to the traditional method.  相似文献   

15.
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically, we compared the synthetic, identity, and chained linear functions for various‐sized samples from two types of national assessments. One design used a highly reliable test and an external anchor, and the other used a relatively low‐reliability test and an internal anchor. The results from each of these methods were compared to the criterion equating function derived from the total samples with respect to linking bias and error. The study indicated that the synthetic functions might be a better choice than the chained linear equating method when samples are not large and, as a result, unrepresentative.  相似文献   

16.
This study applied kernel equating (KE) in two scenarios: equating to a very similar population and equating to a very different population, referred to as a distant population, using SAT® data. The KE results were compared to the results obtained from analogous traditional equating methods in both scenarios. The results indicate that KE results are comparable to the results of other methods. Further, the results show that when the two populations taking the two tests are similar on the anchor score distributions, different equating methods yield the same or very similar results, even though they have different assumptions.  相似文献   

17.
In this study, eight statistical strategies were evaluated for selecting the parameterizations of loglinear models for smoothing the bivariate test score distributions used in nonequivalent groups with anchor test (NEAT) equating. Four of the strategies were based on significance tests of chi-square statistics (Likelihood Ratio, Pearson, Freeman-Tukey, and Cressie-Read) and four additional strategies were based on different evaluations of the Likelihood Ratio Chi-Square statistic (Akaike Information Criterion, Bayesian Information Criterion, Consistent Akaike Information Criterion, and an index traced to Goodman). The focus was the implications of the selection strategies' selection tendencies for the accuracy of chained and poststratification equating functions. The results differentiated the strategies in terms of their tendencies to select models with particular bivariate parameterizations and the implications of these tendencies for equating bias and variability .  相似文献   

18.
Equating methods make use of an appropriate transformation function to map the scores of one test form into the scale of another so that scores are comparable and can be used interchangeably. The equating literature shows that the ways of judging the success of an equating (i.e., the score transformation) might differ depending on the adopted framework. Rather than targeting different parts of the equating process and aiming to evaluate the process from different aspects, this article views the equating transformation as a standard statistical estimator and discusses how this estimator should be assessed in an equating framework. For the kernel equating framework, a numerical illustration shows the potentials of viewing the equating transformation as a statistical estimator as opposed to assessing it using equating‐specific criteria. A discussion on how this approach can be used to compare other equating estimators from different frameworks is also included.  相似文献   

19.
一次指数平滑模型预测法有易掌握、易操作、计算方法呈规律性等特点,在日常生活、生产及统计工作的预测研究中具有极其重要的作用,用Turbo c等计算机语言设计成程序后,可广泛应用于研究工厂生产的发展趋势,并对今后产值进行预测。  相似文献   

20.
In observed‐score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response theory (IRT) model. The parameters from such a model can be utilized to derive the score probabilities for the tests and these score probabilities may then be used in observed‐score equating. In this study, the asymptotic standard errors of observed‐score equating using score probability vectors from polytomous IRT models are derived using the delta method. The results are applied to the equivalent groups design and the nonequivalent groups design with either chain equating or poststratification equating within the framework of kernel equating. The derivations are presented in a general form and specific formulas for the graded response model and the generalized partial credit model are provided. The asymptotic standard errors are accurate under several simulation conditions relating to sample size, distributional misspecification and, for the nonequivalent groups design, anchor test length.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号