期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Marie Wiberg Wim J. van der Linden 《Journal of Educational Measurement》2011,48(3):229-254

Two methods of local linear observed‐score equating for use with anchor‐test and single‐group designs are introduced. In an empirical study, the two methods were compared with the current traditional linear methods for observed‐score equating. As a criterion, the bias in the equated scores relative to true equating based on Lord's (1980) definition of equity was used. The local method for the anchor‐test design yielded minimum bias, even for considerable variation of the relative difficulties of the two test forms and the length of the anchor test. Among the traditional methods, the method of chain equating performed best. The local method for single‐group designs yielded equated scores with bias comparable to the traditional methods. This method, however, appears to be of theoretical interest because it forces us to rethink the relationship between score equating and regression. 相似文献

2.

A Comparison Between Linear IRT Observed‐Score Equating and Levine Observed‐Score Equating Under the Generalized Kernel Equating Framework

Haiwen Chen 《Journal of Educational Measurement》2012,49(3):269-284

In this article, linear item response theory (IRT) observed‐score equating is compared under a generalized kernel equating framework with Levine observed‐score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when using data from IRT models, linear IRT observed‐score equating is virtually identical to Levine observed‐score equating. This leads to the conclusion that poststratification equating based on true anchor scores can be viewed as the curvilinear Levine observed‐score equating. 相似文献

3.

Optimal Bandwidth Selection in Observed‐Score Kernel Equating

Jenny Häggström Marie Wiberg 《Journal of Educational Measurement》2014,51(2):201-211

The selection of bandwidth in kernel equating is important because it has a direct impact on the equated test scores. The aim of this article is to examine the use of double smoothing when selecting bandwidths in kernel equating and to compare double smoothing with the commonly used penalty method. This comparison was made using both an equivalent groups design and a nonequivalent group with anchor test design. The performance of the methods was evaluated through simulation studies using both symmetric and skewed score distributions. In addition, the bandwidth selection methods were applied to real data from a college admissions test. The results show that the traditional penalty method works well although double smoothing is a viable alternative because it performs reasonably well compared to the traditional method. 相似文献

4.

More Issues in Observed‐Score Equating

Wim J. van der Linden 《Journal of Educational Measurement》2013,50(3):324-337

This article is a response to the commentaries on the position paper on observed‐score equating by van der Linden (this issue). The response focuses on the more general issues in these commentaries, such as the nature of the observed scores that are equated, the importance of test‐theory assumptions in equating, the necessity to use multiple equating transformations, and the choice of conditioning variables in equating. 相似文献

5.

Some Conceptual Issues in Observed‐Score Equating

Wim J. van der Linden 《Journal of Educational Measurement》2013,50(3):249-285

In spite of all of the technical progress in observed‐score equating, several of the more conceptual aspects of the process still are not well understood. As a result, the equating literature struggles with rather complex criteria of equating, lack of a test‐theoretic foundation, confusing terminology, and ad hoc analyses. A return to Lord's foundational criterion of equity of equating, a derivation of the true equating transformation from it, and mainstream statistical treatment of the problem of estimating the transformation for various data‐collection designs exist as a solution to the problem. 相似文献

6.

Statistical Assessment of Estimated Transformations in Observed‐Score Equating

Marie Wiberg Jorge Gonzlez 《Journal of Educational Measurement》2016,53(1):106-125

Equating methods make use of an appropriate transformation function to map the scores of one test form into the scale of another so that scores are comparable and can be used interchangeably. The equating literature shows that the ways of judging the success of an equating (i.e., the score transformation) might differ depending on the adopted framework. Rather than targeting different parts of the equating process and aiming to evaluate the process from different aspects, this article views the equating transformation as a standard statistical estimator and discusses how this estimator should be assessed in an equating framework. For the kernel equating framework, a numerical illustration shows the potentials of viewing the equating transformation as a statistical estimator as opposed to assessing it using equating‐specific criteria. A discussion on how this approach can be used to compare other equating estimators from different frameworks is also included. 相似文献

7.

Asymptotic Standard Errors of Observed‐Score Equating With Polytomous IRT Models

Bjrn Andersson 《Journal of Educational Measurement》2016,53(4):459-477

In observed‐score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response theory (IRT) model. The parameters from such a model can be utilized to derive the score probabilities for the tests and these score probabilities may then be used in observed‐score equating. In this study, the asymptotic standard errors of observed‐score equating using score probability vectors from polytomous IRT models are derived using the delta method. The results are applied to the equivalent groups design and the nonequivalent groups design with either chain equating or poststratification equating within the framework of kernel equating. The derivations are presented in a general form and specific formulas for the graded response model and the generalized partial credit model are provided. The asymptotic standard errors are accurate under several simulation conditions relating to sample size, distributional misspecification and, for the nonequivalent groups design, anchor test length. 相似文献

8.

Standard Error of Linear Observed‐Score Equating for the NEAT Design With Nonnormally Distributed Data

Jiyun Zu Ke‐Hai Yuan 《Journal of Educational Measurement》2012,49(2):190-213

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed‐score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the normality assumption, would be preferred, because it is asymptotically accurate regardless of the distribution of the data. In this article, an analytical formula for the standard error of linear observed‐score equating, which characterizes the effect of nonnormality, is obtained under elliptical distributions. Using three large‐scale real data sets as the populations, resampling studies are conducted to empirically evaluate the normal and general estimators of the standard error of linear observed‐score equating. The effect of sample size (50, 100, 250, or 500) and equating method (chained linear, Tucker, or Levine observed‐score equating) are examined. Results suggest that the general estimator has smaller bias than the normal estimator in all 36 conditions; it has larger standard error when the sample size is at least 100; and it has smaller root mean squared error in all but one condition. An R program is also provided to facilitate the use of the general estimator. 相似文献

9.

Preequating With Empirical Item Characteristic Curves: An Observed‐Score Preequating Method

Jiyun Zu Gautam Puhan 《Journal of Educational Measurement》2014,51(3):281-300

Preequating is in demand because it reduces score reporting time. In this article, we evaluated an observed‐score preequating method: the empirical item characteristic curve (EICC) method, which makes preequating without item response theory (IRT) possible. EICC preequating results were compared with a criterion equating and with IRT true‐score preequating conversions. Results suggested that the EICC preequating method worked well under the conditions considered in this study. The difference between the EICC preequating conversion and the criterion equating was smaller than .5 raw‐score points (a practical criterion often used to evaluate equating quality) between the 5th and 95th percentiles of the new form total score distribution. EICC preequating also performed similarly or slightly better than IRT true‐score preequating. 相似文献

10.

Relationships of Measurement Error and Prediction Error in Observed‐Score Regression

Tim Moses 《Journal of Educational Measurement》2012,49(4):380-398

The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed‐score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor variable(s). These measures are demonstrated for regression situations reflecting a range of true score correlations and reliabilities and using one and two predictors. Simulation results also are presented which show that the measures of prediction error variance and its parts are generally well estimated for the considered ranges of true score correlations and reliabilities and for homoscedastic and heteroscedastic data. The final discussion considers how the decomposition might be useful for addressing additional questions about regression functions’ prediction error variances. 相似文献

11.

A Comparison of the Kernel Equating Method with Traditional Equating Methods Using SAT® Data

Jinghua Liu Albert C. Low 《Journal of Educational Measurement》2008,45(4):309-323

This study applied kernel equating (KE) in two scenarios: equating to a very similar population and equating to a very different population, referred to as a distant population, using SAT^® data. The KE results were compared to the results obtained from analogous traditional equating methods in both scenarios. The results indicate that KE results are comparable to the results of other methods. Further, the results show that when the two populations taking the two tests are similar on the anchor score distributions, different equating methods yield the same or very similar results, even though they have different assumptions. 相似文献

12.

Observed Score Linear Equating with Covariates

Kenny Bränberg Marie Wiberg 《Journal of Educational Measurement》2011,48(4):419-440

This paper examined observed score linear equating in two different data collection designs, the equivalent groups design and the nonequivalent groups design, when information from covariates (i.e., background variables correlated with the test scores) was included. The main purpose of the study was to examine the effect (i.e., bias, variance, and mean squared error) on the estimators of including this additional information. A model for observed score linear equating with covariates first was suggested. As a second step, the model was used in a simulation study to show that the use of covariates such as gender and education can increase the accuracy of an equating by reducing the mean squared error of the estimators. Finally, data from two administrations of the Swedish Scholastic Assessment Test were used to illustrate the use of the model. 相似文献

13.

Evaluating Equating Results: Percent Relative Error for Chained Kernel Equating

Yanlin Jiang Alina A. von Davier Haiwen Chen 《Journal of Educational Measurement》2012,49(1):39-58

This article presents a method for evaluating equating results. Within the kernel equating framework, the percent relative error (PRE) for chained equipercentile equating was computed under the nonequivalent groups with anchor test (NEAT) design. The method was applied to two data sets to obtain the PRE, which can be used to measure equating effectiveness. The study compared the PRE results for chained and poststratification equating. The results indicated that the chained method transformed the new form score distribution to the reference form scale more effectively than the poststratification method. In addition, the study found that in chained equating, the population weight had impact on score distributions over the target population but not on the equating and PRE results. 相似文献

14.

Improving the Bandwidth Selection in Kernel Equating

Björn Andersson Alina A. von Davier 《Journal of Educational Measurement》2014,51(3):223-238

We investigate the current bandwidth selection methods in kernel equating and propose a method based on Silverman's rule of thumb for selecting the bandwidth parameters. In kernel equating, the bandwidth parameters have previously been obtained by minimizing a penalty function. This minimization process has been criticized by practitioners for being too complex and that it does not offer sufficient smoothing in certain cases. In addition, the bandwidth parameters have been treated as constants in the derivation of the standard error of equating even when they were selected by considering the observed data. Here, the bandwidth selection is simplified, and modified standard errors of equating (SEEs) that reflect the bandwidth selection method are derived. The method is illustrated with real data examples and simulated data. 相似文献

15.

Observed Score Equating Using Discrete and Passage-Based Anchor Items

Jiyun Zu Jinghua Liu 《Journal of Educational Measurement》2010,47(4):395-412

Equating of tests composed of both discrete and passage-based multiple choice items using the nonequivalent groups with anchor test design is popular in practice. In this study, we compared the effect of discrete and passage-based anchor items on observed score equating via simulation. Results suggested that an anchor with a larger proportion of passage-based items, more items in each passage, and/or a larger degree of local dependence among items within one passage produces larger equating errors, especially when the groups taking the new form and the reference form differ in ability. Our findings challenge the common belief that an anchor should be a miniature version of the tests to be equated. Suggestions to practitioners regarding anchor design are also given. 相似文献

16.

Comments on “Some Conceptual Issues in Observed‐Score Equating” by Wim J. van der Linden

Eric T. Bradlow 《Journal of Educational Measurement》2013,50(3):321-323

The van der Linden article (this issue) provides a roadmap for future research in equating. My belief is that the roadmap begins and ends with collecting auxiliary data that can be utilized to provide improved equating, especially when data are sparse or equating beyond simple moments is desired. 相似文献

17.

Test Score Equating Using a Mini‐Version Anchor and a Midi Anchor: A Case Study Using SAT® Data

Jinghua Liu Sandip Sinharay Paul W. Holland Edward Curley Miriam Feigenbaum 《Journal of Educational Measurement》2011,48(4):361-379

This study explores an anchor that is different from the traditional miniature anchor in test score equating. In contrast to a traditional “mini” anchor that has the same spread of item difficulties as the tests to be equated, the studied anchor, referred to as a “midi” anchor (Sinharay & Holland), has a smaller spread of item difficulties than the tests to be equated. Both anchors were administered in an operational SAT administration and the impact of anchor type on equating was evaluated with respect to systematic error or equating bias. Contradicting the popular belief that the mini anchor is best, the results showed that the mini anchor does not always produce more accurate equating functions than the midi anchor; the midi anchor was found to perform as well as or even better than the mini anchor. Because testing programs usually have more middle difficulty items and few very hard or very easy items, midi external anchors are operationally easier to build. Therefore, the results of our study provide evidence in favor of the midi anchor, the use of which will lead to cost saving with no reduction in equating quality. 相似文献

18.

Statistical Models and Inference for the True Equating Transformation in the Context of Local Equating

Jorge González B. Matthias von Davier 《Journal of Educational Measurement》2013,50(3):315-320

Based on Lord's criterion of equity of equating, van der Linden (this issue) revisits the so‐called local equating method and offers alternative as well as new thoughts on several topics including the types of transformations, symmetry, reliability, and population invariance appropriate for equating. A remarkable aspect is to define equating as a standard statistical inference problem in which the true equating transformation is the parameter of interest that has to be estimated and assessed as any standard evaluation of an estimator of an unknown parameter in statistics. We believe that putting equating methods in a general statistical model framework would be an interesting and useful next step in the area. van der Linden's conceptual article on equating is certainly an important contribution to this task. 相似文献

19.

Using Kernel Equating to Assess Item Order Effects on Test Scores

Tim Moses Wen-Ling Yang Christine Wilson 《Journal of Educational Measurement》2007,44(2):157-178

This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test scores, in overall score distributions and also at specific test scores. In addition to detecting item order effects, the integrated procedures also suggest the equating function that most adequately adjusts the scores to mitigate the effects. To demonstrate, the statistical equivalences of alternate versions of two large-volume advanced placement exams were assessed. 相似文献

20.

Comparison of the One‐ and Bi‐Direction Chained Equipercentile Equating

Hyeonjoo Oh Tim Moses 《Journal of Educational Measurement》2012,49(4):399-418

This study investigated differences between two approaches to chained equipercentile (CE) equating (one‐ and bi‐direction CE equating) in nearly equal groups and relatively unequal groups. In one‐direction CE equating, the new form is linked to the anchor in one sample of examinees and the anchor is linked to the reference form in the other sample. In bi‐direction CE equating, the anchor is linked to the new form in one sample of examinees and to the reference form in the other sample. The two approaches were evaluated in comparison to a criterion equating function (i.e., equivalent groups equating) using indexes such as root expected squared difference, bias, standard error of equating, root mean squared error, and number of gaps and bumps. The overall results across the equating situations suggested that the two CE equating approaches produced very similar results, whereas the bi‐direction results were slightly less erratic, smoother (i.e., fewer gaps and bumps), usually closer to the criterion function, and also less variable. 相似文献