首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Prior use of the equipercentile method of test equating was based on a graphic procedure which is tedious, subject to smoothing errors, and non-analytical. Recognition of the equipercentile method as a curve-fitting procedure for two cumulative percentage distributions leads to a proposed analytical solution to the problem through use of linear estimates for successive "missing" cumulative percentage points. A complete equipercentile procedure which uses the proposed method and provides linear and quadratic functions for goodness-of-fit and extrapolation is discussed and illustrated with data from a test equating project.  相似文献   

2.
A simultaneous equating of four new test forms to each other and to one previous form was accomplished through a complex design incorporating seven separate equating links. Each new form was linked to the reference form by four different paths, and each path produced a different score conversion. The procedure used to resolve these inconsistencies was applied separately at each score level. Considering each equating (at a given score level) as a simple additive increment and imposing constraints on those increments led to a system of seven equations in seven unknowns. The solution produced a set of adjusted increments, so that the linking of any new form to the reference form was the same by all four possible paths.  相似文献   

3.
We investigate the current bandwidth selection methods in kernel equating and propose a method based on Silverman's rule of thumb for selecting the bandwidth parameters. In kernel equating, the bandwidth parameters have previously been obtained by minimizing a penalty function. This minimization process has been criticized by practitioners for being too complex and that it does not offer sufficient smoothing in certain cases. In addition, the bandwidth parameters have been treated as constants in the derivation of the standard error of equating even when they were selected by considering the observed data. Here, the bandwidth selection is simplified, and modified standard errors of equating (SEEs) that reflect the bandwidth selection method are derived. The method is illustrated with real data examples and simulated data.  相似文献   

4.
《教育实用测度》2013,26(3):245-254
A procedure for checking the score equivalence of nearly identical editions of a test is described. This procedure is used early in the score equating process to help determine whether it is necessary to conduct separate equating analyses (using a variety of equating methods) for the two nearly identical versions of the test. The procedure employs the standard error of equating and utilizes graphical representation of score conversion deviation from the identity function in standard error units. Two illustrations of the procedure involving Scholastic Aptitude Test (SAT) data are presented. Advice about what to do if statistical equivalence does not obtain is given in the discussion section. Alternative strategies for assessing score equivalence are also discussed.  相似文献   

5.
This instructional module is intended to promote a conceptual understanding of test form equating using traditional methods. The purpose of equating and the context in which equating occurs are described. The process of equating is distinguished from the related process of scaling to achieve comparability. Three equating designs are considered, and three equating methods—man, linear, and equipercentile—are described and illustrated. Special attention is given to equating with nonequivalent groups, and to sources of equating error.  相似文献   

6.
Three local observed‐score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias—as defined by Lord's criterion of equity—and percent relative error. The local kernel item response theory observed‐score equating method, which can be used for any of the common equating designs, had a small amount of bias, a low percent relative error, and a relatively low kernel standard error of equating, even when the accuracy of the test was reduced. The local kernel equating methods for the nonequivalent groups with anchor test generally had low bias and were quite stable against changes in the accuracy or length of the anchor test. Although all proposed methods showed small percent relative errors, the local kernel equating methods for the nonequivalent groups with anchor test design had somewhat larger standard error of equating than their kernel method counterparts.  相似文献   

7.
A single-group (SG) equating with nearly equivalent test forms (SiGNET) design was developed by Grant to equate small-volume tests. Under this design, the scored items for the operational form are divided into testlets or mini tests. An additional testlet is created but not scored for the first form. If the scored testlets are testlets 1–6 and the unscored testlet is testlet 7, then the first form is composed of testlets 1–6 and the second form is composed of testlets 2–7. The seven testlets are administered as a single administered form, and when a sufficient number of examinees have taken the administered form, the second form (testlets 2–7) is equated to the first form (testlets 1–6) using an SG equating design. As evident, this design facilitates the use of an SG equating and allows for the accumulation of data, both of which may reduce equating error. This study compared equatings under the SiGNET and common-item equating designs and found lower equating error for the SiGNET design in very small sample size conditions (e.g., N = 10).  相似文献   

8.
The impact of log‐linear presmoothing on the accuracy of small sample chained equipercentile equating was evaluated under two conditions . In the first condition the small samples differed randomly in ability from the target population. In the second condition the small samples were systematically different from the target population. Results showed that equating with small samples (e.g., N < 25 or 50) using either raw or smoothed score distributions led to considerable large random equating error (although smoothing reduced random equating error). Moreover, when the small samples were not representative of the target population, the amount of equating bias also was quite large. It is concluded that although presmoothing can reduce random equating error, it is not likely to reduce equating bias caused by using an unrepresentative sample. Other alternatives to the small sample equating problem (e.g., the SiGNET design) which focus more on improving data collection are discussed.  相似文献   

9.
Smoothing is designed to yield smoother equating results that can reduce random equating error without introducing very much systematic error. The main objective of this study is to propose a new statistic and to compare its performance to the performance of the Akaike information criterion and likelihood ratio chi-square difference statistics in selecting the smoothing parameter for polynomial loglinear equating under the random groups design. These model selection statistics were compared for four sample sizes (500, 1,000, 2,000, and 3,000) and eight simulated equating conditions, including both conditions where equating is not needed and conditions where equating is needed. The results suggest that all model selection statistics tend to improve the equating accuracy by reducing the total equating error. The new statistic tended to have less overall error than the other two methods.  相似文献   

10.
This article presents a method for evaluating equating results. Within the kernel equating framework, the percent relative error (PRE) for chained equipercentile equating was computed under the nonequivalent groups with anchor test (NEAT) design. The method was applied to two data sets to obtain the PRE, which can be used to measure equating effectiveness. The study compared the PRE results for chained and poststratification equating. The results indicated that the chained method transformed the new form score distribution to the reference form scale more effectively than the poststratification method. In addition, the study found that in chained equating, the population weight had impact on score distributions over the target population but not on the equating and PRE results.  相似文献   

11.
Educational measurement specialists in undertaking test equating in applied settings have been plagued by the absence of a logically or mathematically compelling rationale for their test equating efforts. Classical test theory and other test theories based on the assumption of identically distributed true scores are tautological in terms of test equating. The present study examined (by means of a Monte Carlo procedure) the effects of four parameters on the accuracy of test equating under a relaxed definition of test form equivalence. The four parameters studied were sample size, test form length, test form reliability, and the correlation between the true scores of the test forms to be equated. Significant interactions involving sample size and the other parameters indicated that smaller samples of observations yielded disproportionately larger errors in test equating for fixed values of the test form parameters. In terms of main effects, sample size emerged as most important in controlling equating error. Taken together, the results suggest that when test equating is carried out on larger samples of observations, errors of equating will tend to be relatively small even though the test forms are not strictly parallel. For arbitrarily small samples, however, errors of equating will tend to be larger regardless of how equivalent the test forms are.  相似文献   

12.
This paper examined observed score linear equating in two different data collection designs, the equivalent groups design and the nonequivalent groups design, when information from covariates (i.e., background variables correlated with the test scores) was included. The main purpose of the study was to examine the effect (i.e., bias, variance, and mean squared error) on the estimators of including this additional information. A model for observed score linear equating with covariates first was suggested. As a second step, the model was used in a simulation study to show that the use of covariates such as gender and education can increase the accuracy of an equating by reducing the mean squared error of the estimators. Finally, data from two administrations of the Swedish Scholastic Assessment Test were used to illustrate the use of the model.  相似文献   

13.
Four types of comparisons for evaluation of mnemonics are identified, two based on equating exposure to the critical to-be-learned associations and two based on total study time. Two experiments on a new mnemonic, the cueword method, are reported, each of which incorporated three of the four comparison procedures. In general, the cueword method proved useful only in comparisons involving control of exposures to the critical to-be-learned associations. The experiments illustrated how highly analytical experiments on mnemonics can be conducted, and the results illustrated that choice of comparison in mnemonics research can greatly affect the evaluation of a mnemonic. General design suggestions for mnemonics research are made.  相似文献   

14.
测验等值研究综述   总被引:1,自引:0,他引:1  
本研究从研究历史、概念界定、数据收集设计、等值模型和等值方法、等值误差及不同等值方法的评价标准等五个方面对测验等值研究进行了文献综述,以期为今后等值研究的进一步开展提供理论基础。  相似文献   

15.
The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional approaches when data exhibited evidence for multidimensionality. In addition, some of the proposed methods were successful in providing equating results for both section-level and composite-level scores, which has not been achieved by most of the existing methodologies. The traditional method of using a set of quadrature points and weights for equating turned out to be computationally intensive, particularly for the data with higher dimensions. The study suggested an alternative way of using the Monte-Carlo approach for such data. This study also proposed a simple-structure true-score equating procedure that does not rely on a multivariate observed-score distribution.  相似文献   

16.
The purpose of this study was to determine if a linear procedure, typically applied to an entire examination when equating scores and reseating judges' standards, could be used with individual item data gathered through Angoffs standard-setting method (1971). Specifically, experts estimates of borderline group performance on one form of a test were transformed to be on the same scale as experts' estimates of borderline group performance on another form of the test. The transformations were based on examinees' responses to the items and on judges' estimates of borderline group performance. The transformed values were compared to the actual estimates provided by a group of judges. The equated and reseated values were reasonably close to those actually assigned by the experts. Bias in the estimates was also relatively small. In general, the reseating procedure was more accurate than the equating procedure, especially when the examinee sample size for equating was small.  相似文献   

17.
为探讨全测验与锚测验不同的客观题与主观题分值比对等值误差造成的影响,本文设计两种全测验与锚测验题型分值比,以等值标准误为因变量,构建2X2的两因素完全随机化设计进行等值误差的方差分析。结果表明,全测验题型分值比与锚测验题型分值比两因素的主效应显著(P〈0.001),交互作用显著(P〈0.01),简单效应检验表明两因素在各水平上差异显著(P〈0.01)。全测验题型分值比与锚测验题型分值比对等值误差产生一定的影响,在等值过程中应该考虑这两个影响因素,为了减小等值过程的误差,锚测验题型分值比应该尽量与全测验题型分值比相一致。  相似文献   

18.
测验等值设计新探讨:ETP设计   总被引:1,自引:1,他引:0  
项目反应理论框架下新的基于题库的大型测验的等值设计:等值到题库设计(ETP设计),与其他传统等值设计相比,可以避免传统共同组设计和共同题设计的一些缺点,并能够在保证等值精度的情况下对测验进行等值。在目前许多大型考试已有题库的情况下,ETP设计具有较大的发展空间。  相似文献   

19.
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically, we compared the synthetic, identity, and chained linear functions for various‐sized samples from two types of national assessments. One design used a highly reliable test and an external anchor, and the other used a relatively low‐reliability test and an internal anchor. The results from each of these methods were compared to the criterion equating function derived from the total samples with respect to linking bias and error. The study indicated that the synthetic functions might be a better choice than the chained linear equating method when samples are not large and, as a result, unrepresentative.  相似文献   

20.
In this article, linear item response theory (IRT) observed‐score equating is compared under a generalized kernel equating framework with Levine observed‐score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when using data from IRT models, linear IRT observed‐score equating is virtually identical to Levine observed‐score equating. This leads to the conclusion that poststratification equating based on true anchor scores can be viewed as the curvilinear Levine observed‐score equating.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号