首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis.  相似文献   

2.
为比较结构方程模型和 IRT等级反应模型在人格量表项目筛选上的作用,以《中国大学生人格量表》的7229个实际测量数据为基础,针对因素二“爽直”分别以Lisrel8.70和Multilog7.03进行结构方程模型和等级反应模型的参数估计与拟合,比较两种方法的项目筛选结果.二者统计结果均认为项目5、6、7、8拟合度不佳,在结构方程模型上表现为因子负荷较低,整体拟合指数不理想;在等级反应模型上表现为区分度参数和位置参数不理想,相关项目的特征曲线和信息曲线形态较差.但结构方程模型倾向于项目6、8更差,而等级反应模型则倾向于项目5、6更差.结构方程模型和 IRT等级反应模型对人格量表项目的统计推断结果从总体上讲是一致的,但在个别项目上略有差异.二者各有优势,可以结合使用.  相似文献   

3.
The high school grade point average (GPA) is often adjusted to account for nominal indicators of course rigor, such as “honors” or “advanced placement.” Adjusted GPAs—also known as weighted GPAs—are frequently used for computing students’ rank in class and in the college admission process. Despite the high stakes attached to GPA, weighting policies vary considerably across states and high schools. Previous methods of estimating weighting parameters have used regression models with college course performance as the dependent variable. We discuss and demonstrate the suitability of the graded response model for estimating GPA weighting parameters and evaluating traditional weighting schemes. In our sample, which was limited to self‐reported performance in high school mathematics courses, we found that commonly used policies award more than twice the bonus points necessary to create parity for standard and advanced courses.  相似文献   

4.
学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。  相似文献   

5.
Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.  相似文献   

6.
We examined summary indices of high school performance (coursework, grades, and test scores) based on the graded response model (GRM). The indices varied by inclusion of ACT test scores and whether high school courses were constrained to have the same difficulty and discrimination across groups of schools. The indices were examined with respect to skewness, incremental prediction of college degree attainment, and differences across racial/ethnic and socioeconomic subgroups. The most difficult high school courses to earn an “A” grade included calculus, chemistry, trigonometry, other advanced math, physics, algebra 2, and geometry. The GRM‐based indices were less skewed than simple high school grade point average (HSGPA) and had higher correlations with ACT Composite score. The index that included ACT test scores and allowed item parameters to vary by school group was most predictive of college degree attainment, but had larger subgroup differences. Implications for implementing multiple measure models for college readiness are discussed.  相似文献   

7.
Given the relationships of item response theory (IRT) models to confirmatory factor analysis (CFA) models, IRT model misspecifications might be detectable through model fit indexes commonly used in categorical CFA. The purpose of this study is to investigate the sensitivity of weighted least squares with adjusted means and variance (WLSMV)-based root mean square error of approximation, comparative fit index, and Tucker–Lewis Index model fit indexes to IRT models that are misspecified due to local dependence (LD). It was found that WLSMV-based fit indexes have some functional relationships to parameter estimate bias in 2-parameter logistic models caused by violations of LD. Continued exploration into these functional relationships and development of LD-detection methods based on such relationships could hold much promise for providing IRT practitioners with global information on violations of local independence.  相似文献   

8.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.  相似文献   

9.
Nambury S. Raju (1937–2005) developed two model‐based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures ( Raju, 1988 ) and Raju's DFIT ( Raju, van der Linden, & Fleer, 1995 ), are based on quantifying the gap between item characteristic functions (ICFs). This approach provides an intuitive and flexible methodology for assessing DIF. The purpose of this tutorial is to explain DFIT and show how this methodology can be utilized in a variety of DIF applications.  相似文献   

10.
This article proposes a model-based procedure, intended for personality measures, for exploiting the auxiliary information provided by the certainty with which individuals answer every item (response certainty). This information is used to (a) obtain more accurate estimates of individual trait levels, and (b) provide a more detailed assessment of the consistency with which the individual responds to the test. The basis model consists of 2 submodels: an item response theory submodel for the responses, and a linear-in-the-coefficients submodel that describes the response certainties. The latter is based on the distance-difficulty hypothesis, and is parameterized as a factor-analytic model. Procedures for (a) estimating the structural parameters, (b) assessing model–data fit, (c) estimating the individual parameters, and (d) assessing individual fit are discussed. The proposal was used in an empirical study. Model–data fit was acceptable and estimates were meaningful. Furthermore, the precision of the individual trait estimates and the assessment of the individual consistency improved noticeably.  相似文献   

11.
ABSTRACT

The Defining Issues Test (DIT) has been the dominant measure of moral development. The DIT has its roots in Kohlberg’s original stage theory of moral judgment development and asks respondents to rank a set of stage typed statements in order of importance on six stories. However, the question to what extent the DIT-data match the underlying stage model was never addressed with a statistical model. Therefore, we applied item response theory (IRT) to a large data set (55,319 cases). We found that the ordering of the stages as extracted from the raw data fitted the ordering in the underlying stage model good. Furthermore, difficulty differences of stages across the stories were found and their magnitude and location were visualized. These findings are compatible with the notion of one latent moral developmental dimension and lend support to the hundreds of studies that have used the DIT-1 and by implication support the renewed DIT-2.  相似文献   

12.
13.
This paper presents a mixture item response tree (IRTree) model for extreme response style. Unlike traditional applications of single IRTree models, a mixture approach provides a way of representing the mixture of respondents following different underlying response processes (between individuals), as well as the uncertainty present at the individual level (within an individual). Simulation analyses reveal the potential of the mixture approach in identifying subgroups of respondents exhibiting response behavior reflective of different underlying response processes. Application to real data from the Students Like Learning Mathematics (SLM) scale of Trends in International Mathematics and Science Study (TIMSS) 2015 demonstrates the superior comparative fit of the mixture representation, as well as the consequences of applying the mixture on the estimation of content and response style traits. We argue that methodology applied to investigate response styles should attend to the inherent uncertainty of response style influence due to the likely influence of both response styles and the content trait on the selection of extreme response categories.  相似文献   

14.
Book reviews     
Statistical Models for Ordinal Variables. C. C. Clogg and E. S. Shihadeh. Thousand Oaks, CA: Sage, 1994, 192 pages.

Graphical Multivariate Analysis with AMOS, EQS and LISREL: A Visual Approach to Covariance Structure Analysis (in Japanese). Yutaka Kano. Kyoto, Japan: Gendai‐Sugakusha, 1997,235 pages.  相似文献   

15.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

16.
本研究旨在从一维和多维的角度检测国际教育成效评价协会(IEA)儿童认知发展状况测验中中译英考题的项目功能差异(DIF)。我们分析的数据由871名中国儿童和557名美国儿童的测试数据组成。结果显示,有一半以上的题目存在实质的DIF,意味着这个测验对于中美儿童而言,并没有功能等值。使用者应谨慎使用该跨语言翻译的比较测试结果来比较中美两国考生的认知能力水平。所幸约有半数的DIF题目偏向中国,半数偏向美国,因此利用测验总分所建立的量尺,应该不至于有太大的偏误。此外,题目拟合度统计量并不能足够地检测到存在DIF的题目,还是应该进行特定的DIF分析。我们探讨了三种可能导致DIF的原因,尚需更多学科专业知识和实验来真正解释DIF的形成。  相似文献   

17.
We propose a structural equation model, which reduces to a multidimensional latent class item response theory model, for the analysis of binary item responses with nonignorable missingness. The missingness mechanism is driven by 2 sets of latent variables: one describing the propensity to respond and the other referred to the abilities measured by the test items. These latent variables are assumed to have a discrete distribution, so as to reduce the number of parametric assumptions regarding the latent structure of the model. Individual covariates can also be included through a multinomial logistic parameterization for the distribution of the latent variables. Given the discrete nature of this distribution, the proposed model is efficiently estimated by the expectation–maximization algorithm. A simulation study is performed to evaluate the finite-sample properties of the parameter estimates. Moreover, an application is illustrated with data coming from a student entry test for the admission to some university courses.  相似文献   

18.
As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.  相似文献   

19.
Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.  相似文献   

20.
In this digital ITEMS module, Dr. Brian Leventhal and Dr. Allison Ames provide an overview of Monte Carlo simulation studies (MCSS) in item response theory (IRT). MCSS are utilized for a variety of reasons, one of the most compelling being that they can be used when analytic solutions are impractical or nonexistent because they allow researchers to specify and manipulate an array of parameter values and experimental conditions (e.g., sample size, test length, and test characteristics). Dr. Leventhal and Dr. Ames review the conceptual foundation of MCSS in IRT and walk through the processes of simulating total scores as well as item responses using the two-parameter logistic, graded response, and bifactor models. They provide guidance for how to implement MCSS using other item response models and best practices for efficient syntax and executing an MCSS. The digital module contains sample SAS code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号