首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The posterior predictive model checking method is a flexible Bayesian model‐checking tool and has recently been used to assess fit of dichotomous IRT models. This paper extended previous research to polytomous IRT models. A simulation study was conducted to explore the performance of posterior predictive model checking in evaluating different aspects of fit for unidimensional graded response models. A variety of discrepancy measures (test‐level, item‐level, and pair‐wise measures) that reflected different threats to applications of graded IRT models to performance assessments were considered. Results showed that posterior predictive model checking exhibited adequate power in detecting different aspects of misfit for graded IRT models when appropriate discrepancy measures were used. Pair‐wise measures were found more powerful in detecting violations of the unidimensionality and local independence assumptions.  相似文献   

2.
Posterior predictive model checking (PPMC) is a Bayesian model checking method that compares the observed data to (plausible) future observations from the posterior predictive distribution. We propose an alternative to PPMC in the context of structural equation modeling, which we term the poor person’s PPMC (PP-PPMC), for the situation wherein one cannot afford (or is unwilling) to draw samples from the full posterior. Using only by-products of likelihood-based estimation (maximum likelihood estimate and information matrix), the PP-PPMC offers a natural method to handle parameter uncertainty in model fit assessment. In particular, a coupling relationship between the classical p values from the model fit chi-square test and the predictive p values from the PP-PPMC method is carefully examined, suggesting that PP-PPMC might offer an alternative, principled approach for model fit assessment. We also illustrate the flexibility of the PP-PPMC approach by applying it to case-influence diagnostics.  相似文献   

3.
Proper model specification is an issue for researchers, regardless of the estimation framework being utilized. Typically, indexes are used to compare the fit of one model to the fit of an alternate model. These indexes only provide an indication of relative fit and do not necessarily point toward proper model specification. There is a procedure in the Bayesian framework called posterior predictive checking that is designed theoretically to detect model misspecification for observed data. However, the performance of the posterior predictive check procedure has thus far not been directly examined under different conditions of mixture model misspecification. This article addresses this task and aims to provide additional insight into whether or not posterior predictive checks can detect model misspecification within the context of Bayesian growth mixture modeling. Results indicate that this procedure can only identify mixture model misspecification under very extreme cases of misspecification.  相似文献   

4.
Factors Which Influence Precision of School-Level IRT Ability Estimates   总被引:1,自引:0,他引:1  
The precision of the group-level IRT model applied to school ability estimation is described, assuming use of Bayesian estimation with precision represented by the standard deviation of the posterior distribution. Similarities and differences between the school-level model and the familiar individual-level IRT model are considered. School size and between-school variability, two factors not relevant at the student level, are dominant determinants of school-level precision. Under the multiple-matrix sampling design required for the school-level IRT, the number of items associated with a scale does not influence the precision at the school level. Also, the effects of school ability and item quality on school-level precision are often relatively weak. It was found that the use of Bayesian estimation could result in a systematic distortion of the true ranking of schools based on ability because of an estimation bias which is a function of school size.  相似文献   

5.
This article examines Bayesian model averaging as a means of addressing predictive performance in Bayesian structural equation models. The current approach to addressing the problem of model uncertainty lies in the method of Bayesian model averaging. We expand the work of Madigan and his colleagues by considering a structural equation model as a special case of a directed acyclic graph. We then provide an algorithm that searches the model space for submodels and obtains a weighted average of the submodels using posterior model probabilities as weights. Our simulation study provides a frequentist evaluation of our Bayesian model averaging approach and indicates that when the true model is known, Bayesian model averaging does not yield necessarily better predictive performance compared to nonaveraged models. However, our case study using data from an international large-scale assessment reveals that the model-averaged submodels provide better posterior predictive performance compared to the initially specified model.  相似文献   

6.
Appropriate model specification is fundamental to unbiased parameter estimates and accurate model interpretations in structural equation modeling. Thus detecting potential model misspecification has drawn the attention of many researchers. This simulation study evaluates the efficacy of the Bayesian approach (the posterior predictive checking, or PPC procedure) under multilevel bifactor model misspecification (i.e., ignoring a specific factor at the within level). The impact of model misspecification on structural coefficients was also examined in terms of bias and power. Results showed that the PPC procedure performed better in detecting multilevel bifactor model misspecification, when the misspecification became more severe and sample size was larger. Structural coefficients were increasingly negatively biased at the within level, as model misspecification became more severe. Model misspecification at the within level affected the between-level structural coefficient estimates more when data dependency was lower and the number of clusters was smaller. Implications for researchers are discussed.  相似文献   

7.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.  相似文献   

8.
9.
Bayesian methods incorporate model parameter information prior to data collection. Eliciting information from content experts is an option, but has seen little implementation in Bayesian item response theory (IRT) modeling. This study aims to use ethical reasoning content experts to elicit prior information and incorporate this information into Markov Chain Monte Carlo (MCMC) estimation. A six‐step elicitation approach is followed, with relevant details at each stage for two IRT items parameters: difficulty and guessing. Results indicate that using content experts is the preferred approach, rather than noninformative priors, for both parameter types. The use of a noninformative prior for small samples provided dramatically different results when compared to results from content expert–elicited priors. The WAMBS (When to worry and how to Avoid the Misuse of Bayesian Statistics) checklist is used to aid in comparisons.  相似文献   

10.
In psychological research, available data are often insufficient to estimate item factor analysis (IFA) models using traditional estimation methods, such as maximum likelihood (ML) or limited information estimators. Bayesian estimation with common-sense, moderately informative priors can greatly improve efficiency of parameter estimates and stabilize estimation. There are a variety of methods available to evaluate model fit in a Bayesian framework; however, past work investigating Bayesian model fit assessment for IFA models has assumed flat priors, which have no advantage over ML in limited data settings. In this paper, we evaluated the impact of moderately informative priors on ability to detect model misfit for several candidate indices: posterior predictive checks based on the observed score distribution, leave-one-out cross-validation, and widely available information criterion (WAIC). We found that although Bayesian estimation with moderately informative priors is an excellent aid for estimating challenging IFA models, methods for testing model fit in these circumstances are inadequate.  相似文献   

11.
Despite its importance to structural equation modeling, model evaluation remains underdeveloped in the Bayesian SEM framework. Posterior predictive p-values (PPP) and deviance information criteria (DIC) are now available in popular software for Bayesian model evaluation, but they remain underutilized. This is largely due to the lack of recommendations for their use. To address this problem, PPP and DIC were evaluated in a series of Monte Carlo simulation studies. The results show that both PPP and DIC are influenced by severity of model misspecification, sample size, model size, and choice of prior. The cutoffs PPP < 0.10 and ?DIC > 7 work best in the conditions and models tested here to maintain low false detection rates and misspecified model selection rates, respectively. The recommendations provided in this study will help researchers evaluate their models in a Bayesian SEM analysis and set the stage for future development and evaluation of Bayesian SEM fit indices.  相似文献   

12.
An Extension of Four IRT Linking Methods for Mixed-Format Tests   总被引:1,自引:0,他引:1  
Under item response theory (IRT), linking proficiency scales from separate calibrations of multiple forms of a test to achieve a common scale is required in many applications. Four IRT linking methods including the mean/mean, mean/sigma, Haebara, and Stocking-Lord methods have been presented for use with single-format tests. This study extends the four linking methods to a mixture of unidimensional IRT models for mixed-format tests. Each linking method extended is intended to handle mixed-format tests using any mixture of the following five IRT models: the three-parameter logistic, graded response, generalized partial credit, nominal response (NR), and multiple-choice (MC) models. A simulation study is conducted to investigate the performance of the four linking methods extended to mixed-format tests. Overall, the Haebara and Stocking-Lord methods yield more accurate linking results than the mean/mean and mean/sigma methods. When the NR model or the MC model is used to analyze data from mixed-format tests, limitations of the mean/mean, mean/sigma, and Stocking-Lord methods are described.  相似文献   

13.
This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test characteristic curve (i.e., the IRT true‐score (TS) estimator). The five methods are compared using a simulation study and a real data example. Results indicated that the application of different methods can sometimes lead to different estimated cut scores, and that there can be some key differences in impact data when using the IRT TS estimator compared to other methods. It is suggested that one should carefully think about their choice of methods to estimate ability and cut scores because different methods have distinct features and properties. An important consideration in the application of Bayesian methods relates to the choice of the prior and the potential bias that priors may introduce into estimates.  相似文献   

14.
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA and CC nonparametrically by replacing the role of the parametric IRT model in Lee's classification indices with a modified version of Ramsay's kernel‐smoothed item response functions. The performance of the nonparametric CA and CC indices are tested in simulation studies in various conditions with different generating IRT models, test lengths, and ability distributions. The nonparametric approach to CA often outperforms Lee's method and Livingston and Lewis's method, showing robustness to nonnormality in the simulated ability. The nonparametric CC index performs similarly to Lee's method and outperforms Livingston and Lewis's method when the ability distributions are nonnormal.  相似文献   

15.
This article summarizes the continuous latent trait IRT approach to skills diagnosis as particularized by a representative variety of continuous latent trait models using item response functions (IRFs). First, several basic IRT-based continuous latent trait approaches are presented in some detail. Then a brief summary of estimation, model checking, and assessment scoring aspects are discussed. Finally, the University of California at Berkeley multidimensional Rasch-model-grounded SEPUP middle school science-focused embedded assessment project is briefly described as one significant illustrative application.  相似文献   

16.
Assessing the correspondence between model predictions and observed data is a recommended procedure for justifying the application of an IRT model. However, with shorter tests, current goodness-of-fit procedures that assume precise point estimates of ability, are inappropriate. The present paper describes a goodness-of-fit statistic that considers the imprecision with which ability is estimated and involves constructing item fit tables based on each examinee's posterior distribution of ability, given the likelihood of their response pattern and an assumed marginal ability distribution. However, the posterior expectations that are computed are dependent and the distribution of the goodness-of-fit statistic is unknown. The present paper also describes a Monte Carlo resampling procedure that can be used to assess the significance of the fit statistic and compares this method with a previously used method. The results indicate that the method described herein is an effective and reasonably simple procedure for assessing the validity of applying IRT models when ability estimates are imprecise.  相似文献   

17.
This inquiry is an investigation of item response theory (IRT) proficiency estimators’ accuracy under multistage testing (MST). We chose a two‐stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two‐stage MST panels (i.e., forms) by manipulating two assembly conditions in each module, such as difficulty level and module length. For each panel, we investigated the accuracy of examinees’ proficiency levels derived from seven IRT proficiency estimators. The choice of Bayesian (prior) versus non‐Bayesian (no prior) estimators was of more practical significance than the choice of number‐correct versus item‐pattern scoring estimators. The Bayesian estimators were slightly more efficient than the non‐Bayesian estimators, resulting in smaller overall error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for low‐ and high‐performing examinees.  相似文献   

18.
本研究应用Caojing等人的Bayesian IRT Guessing系列模型,分析初中二年级学生在汉语词汇测验中的猜测行为,使用DIC3指标评价模型的拟合程度,并将参数估计结果与双参数Logistic模型进行了比较。研究发现:(1)猜测模型的拟合度优于双参数Logistic模型;(2)初中二年级测验数据最适合临界猜测模型(IRT-TG),约有3.5%的学生存在TG型猜测行为;(3)猜测者的存在会明显影响本身的能力估计与项目难度估计,但是对非猜测者的能力及区分度参数估计影响不大。  相似文献   

19.
According to item response theory (IRT), examinee ability estimation is independent of the particular set of test items administered from a calibrated pool. Although the most popular application of this feature of IRT is computerized adaptive (CA) testing, a recently proposed alternative is self-adapted (SA) testing, in which examinees choose the difficulty level of each of their test items. This study compared examinee performance under SA and CA tests, finding that examinees taking the SA test (a) obtained significantly higher ability scores and (b) reported significantly lower posttest state anxiety. The results of this study suggest that SA testing is a desirable format for computer-based testing.  相似文献   

20.
《Educational Assessment》2013,18(2):125-146
States are implementing statewide assessment programs that classify students into proficiency levels that reflect state-defined performance standards. In an effort to provide support for score interpretations, this study examined the consistency of classifications based on competing item response theory (IRT) models for data from a state assessment program. Classification of students into proficiency levels was compared based on a 1-parameter vs. a 3-parameter IRT model. Despite an overall high level of agreement between classifications based on the 2 models, systematic differences were observed. Under the 1-parameter model, proficiency was underestimated for low proficiency classifications but overestimated for upper proficiency classifications. This resulted in higher "Below Basic" and "Advanced" classifications under 1-parameter vs. 3-parameter IRT applications. Implications of these differences are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号