首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered the true parameter. The simulation results suggest that item parameters were not recovered well when IPD was ignored, especially if there was a larger number of IPD conditions. In addition, coverage was not accurate in all IPD conditions when IPD is ignored. Also, the results suggest that the accuracy of person scores (measured by bias) is potentially problematic when the larger number of IPD items is ignored. However, the overall accuracy (measured by RMSE) and coverage were unexpectedly acceptable in the presence of IPD as defined in this study.  相似文献   

2.
This study examined the effects of ignoring multilevel data structures in nonhierarchical covariance modeling using a Monte Carlo simulation. Multilevel sample data were generated with respect to 3 design factors: (a) intraclass correlation, (b) group and member configuration, and (c) the models that underlie the between-group and within-group variance components associated with multilevel data. Covariance models that ignored the multilevel structure were then fit to the data. Results indicated that when variables exhibit minimal levels of intraclass correlation, the chi-square model/data fit statistic, the parameter estimators, and the standard error estimators are relatively unbiased. However, as the level of intraclass correlation increases, the chi-square statistic, the parameters, and their standard errors all exhibit estimation problems. The specific group/member configurations as well as the underlying between-group and within-group model structures further exacerbate the estimation problems encountered in the nonhierarchical analysis of multilevel data.  相似文献   

3.
4.
论项目反应理论   总被引:2,自引:0,他引:2  
本文就项目反应理论产生的历史背景,发展史及其特点和在教育、心理测量上的应用等方面进行了讨论,提出了信度的理论问题和它的若干模型。  相似文献   

5.
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.  相似文献   

6.
Several recent papers have argued for the usefulness of item response theory (IRT) methods of assessing item discrimination power for criterion-referenced tests (CRTs). Conventional methods continue to be used more widely, however, for reasons that include some practical constraints associated with the use of IRT methods. To provide users with information that may help them to decide on which conventional indices to employ in evaluating CRT items, Spearman rank-order correlations were computed between IRT-derived item information functions (llFs) and four conventional discrimination indices: the phi-coefficient, the B-index, phi/phi max, and the agreement statistic. The rank-order correlations between the phi-coefficient and the llFs were very high, with a median of .96. The remaining conventional indices, with the exception of phi-over-phi-max, also correlated well with the IIF. Theoretical explanations for these relationships are offered.  相似文献   

7.
项目反应数据的建模过程属于项目反应理论范畴,被称为现代测量理论。随着社会测量要求的广度和复杂度的增加,以及测量功能的不断扩展的要求,需要越来越复杂的项目反应模型来完成心理学、教育学、社会学等领域的测量任务。本文就当前较普遍以及发展迅速的项目反应复杂模型,如高阶、多维、多层模型进行论述,并且描述了复杂模型的参数评估技术,结合复杂模型的应用情况,期望本土的测量技术向客观化、尖端化发展。  相似文献   

8.
本研究通过Monte Carlo模拟,探讨MH和LR两种方法在检测DIF时I型错误率和检出率的情况。实验结果表明两种方法的I型错误均控制在0.05左右(α=0.05),LR方法的I型错误率呈现出更加稳定的状态。一致性DIF时,MH方法的检出率略高于LR方法;而非一致性DIF时,LR方法的检出率大大高于MH方法,MH方法对非一致性DIF不敏感。另外,两种方法一致性DIF的检出率随有DIF题目的比例增加而增加,而非一致性DIF的检出率随比例的增加而有所降低。  相似文献   

9.
Ignoring a level can have a substantial impact on the conclusions of a multilevel analysis. For intercept-only models and for balanced data, we derive these effects analytically. For more complex random intercept models or for unbalanced data, a simulation study is performed. Most important effects concern estimates and corresponding standard errors of the variance parameters at adjacent levels and of the coefficients of the predictors at the ignored and bordering levels. Therefore, we conclude that if the researcher is interested in a specific level, she/he should account for both the upper and lower level. Conclusions are illustrated using empirical data from educational research.  相似文献   

10.
This article demonstrates the utility of restricted item response models for examining item difficulty ordering and slope uniformity for an item set that reflects varying cognitive processes. Twelve sets of paired algebra word problems were developed to systematically reflect various types of cognitive processes required for successful performance. This resulted in a total of 24 items. They reflected distance-rate–time (DRT), interest, and area problems. Hypotheses concerning difficulty ordering and slope uniformity for the items were tested by constraining item difficulty and discrimination parameters in hierarchical item response models. The first set of model comparisons tested the equality of the discrimination and difficulty parameters for each set of paired items. The second set of model comparisons examined slope uniformity within the complex DRT problems. The third set of model comparisons examined whether the familiarity of the story context affected item difficulty for two types of complex DRT problems. The last set of model comparisons tested the hypothesized difficulty ordering of the items.  相似文献   

11.
A computer simulation study was conducted to determine the feasibility of using logistic regression procedures to detect differential item functioning (DIF) in polytomous items. One item in a simulated test of 25 items contained DIF; parameters' for that item were varied to create three conditions of nonuniform DIF and one of uniform DIF. Item scores were generated using a generalized partial credit model, and the data were recoded into multiple dichotomies in order to use logistic regression procedures. Results indicate that logistic regression is powerful in detecting most forms of DIF; however, it required large amounts of data manipulation, and interpretation of the results was sometimes difficult. Some logistic regression procedures may be useful in the post hoc analysis of DlF for polytomous items.  相似文献   

12.
13.
How has Item Response Theory helped solve problems in the development and use of computer-adaptive tests? Do we need to balance item content with computer-adaptive tests? Could we use IRT to evaluate unusual responses to computer-delivered tests?  相似文献   

14.
The information matrix can equivalently be determined via the expectation of the Hessian matrix or the expectation of the outer product of the score vector. The identity of these two matrices, however, is only valid in case of a correctly specified model. Therefore, differences between the two versions of the observed information matrix indicate model misfit. The equality of both matrices can be tested with the so‐called information matrix test as a general test of misspecification. This test can be adapted to item response models in order to evaluate the fit of single items and the fit of the whole scale. The performance of different versions of the test is compared in a simulation study with existing tests of model fit, among them the test of Orlando and Thissen, the score test of local independence due to Glas and Suarez‐Falcon, and the limited information approach of Maydeu‐Olivares and Joe. In general, the different versions of the information matrix test adhere to the nominal Type I error rate and have high power for detecting misspecified item characteristic curves. Additionally, some versions of the test can be used in order to detect violations of the local independence assumption.  相似文献   

15.
《教育实用测度》2013,26(2):125-141
Item parameter instability can threaten the validity of inferences about changes in student achievement when using Item Response Theory- (IRT) based test scores obtained on different occasions. This article illustrates a model-testing approach for evaluating the stability of IRT item parameter estimates in a pretest-posttest design. Stability of item parameter estimates was assessed for a random sample of pretest and posttest responses to a 19-item math test. Using MULTILOG (Thissen, 1986), IRT models were estimated in which item parameter estimates were constrained to be equal across samples (reflecting stability) and item parameter estimates were free to vary across samples (reflecting instability). These competing models were then compared statistically in order to test the invariance assumption. The results indicated a moderately high degree of stability in the item parameter estimates for a group of children assessed on two different occasions.  相似文献   

16.
The purpose of this study was to investigate the power and Type I error rate of the likelihood ratio goodness-of-fit (LR) statistic in detecting differential item functioning (DIF) under Samejima's (1969, 1972) graded response model. A multiple-replication Monte Carlo study was utilized in which DIF was modeled in simulated data sets which were then calibrated with MULTILOG (Thissen, 1991) using hierarchically nested item response models. In addition, the power and Type I error rate of the Mantel (1963) approach for detecting DIF in ordered response categories were investigated using the same simulated data, for comparative purposes. The power of both the Mantel and LR procedures was affected by sample size, as expected. The LR procedure lacked the power to consistently detect DIF when it existed in reference/focal groups with sample sizes as small as 500/500. The Mantel procedure maintained control of its Type I error rate and was more powerful than the LR procedure when the comparison group ability distributions were identical and there was a constant DIF pattern. On the other hand, the Mantel procedure lost control of its Type I error rate, whereas the LR procedure did not, when the comparison groups differed in mean ability; and the LR procedure demonstrated a profound power advantage over the Mantel procedure under conditions of balanced DIF in which the comparison group ability distributions were identical. The choice and subsequent use of any procedure requires a thorough understanding of the power and Type I error rates of the procedure under varying conditions of DIF pattern, comparison group ability distributions.–or as a surrogate, observed score distributions–and item characteristics.  相似文献   

17.
The relations among several alternative parameterizations of the binary factor analysis model and the 2-parameter item response theory model are discussed. It is pointed out that different parameterizations of factor analysis model parameters can be transformed into item response model theory parameters, and general formulas are provided. Illustrative data analysis is provided to demonstrate the transformations.  相似文献   

18.
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting model misfit. The purposes of this study were to extend the use of RISE to more general and comprehensive applications by manipulating a variety of factors (e.g., test length, sample size, IRT models, ability distribution). The results from the simulation study demonstrated that RISE outperformed G2 and S‐X2 in that it controlled Type I error rates and provided adequate power under the studied conditions. In the empirical study, RISE detected reasonable numbers of misfitting items compared to G2 and S‐X2, and RISE gave a much clearer picture of the location and magnitude of misfit for each misfitting item. In addition, there was no practical consequence to classification before and after replacement of misfitting items detected by three fit statistics.  相似文献   

19.
Correlational evidence suggests that high school GPA is better than admission test scores in predicting first-year college GPA, although test scores have incremental predictive validity. The usefulness of a selection variable in making admission decisions depends in part on its predictive validity, but also on institutions’ selectivity and definition of success. Analyses of data from 192 institutions suggest that high school GPA is more useful than admission test scores in situations involving low selectivity in admissions and minimal to average academic performance in college. In contrast, test scores are more useful than high school GPA in situations involving high selectivity and high academic performance. In nearly all contexts, test scores have incremental usefulness beyond high school GPA. Moreover, high school GPA by test score interactions are important in predicting academic success.  相似文献   

20.
Even though Bayesian estimation has recently become quite popular in item response theory (IRT), there is a lack of works on model checking from a Bayesian perspective. This paper applies the posterior predictive model checking (PPMC) method ( Guttman, 1967 ; Rubin, 1984 ), a popular Bayesian model checking tool, to a number of real applications of unidimensional IRT models. The applications demonstrate how to exploit the flexibility of the posterior predictive checks to meet the need of the researcher. This paper also examines practical consequences of misfit, an area often ignored in educational measurement literature while assessing model fit.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号