期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Nana Kim Daniel M. Bolt 《Educational and psychological measurement》2021,81(1):131

This paper presents a mixture item response tree (IRTree) model for extreme response style. Unlike traditional applications of single IRTree models, a mixture approach provides a way of representing the mixture of respondents following different underlying response processes (between individuals), as well as the uncertainty present at the individual level (within an individual). Simulation analyses reveal the potential of the mixture approach in identifying subgroups of respondents exhibiting response behavior reflective of different underlying response processes. Application to real data from the Students Like Learning Mathematics (SLM) scale of Trends in International Mathematics and Science Study (TIMSS) 2015 demonstrates the superior comparative fit of the mixture representation, as well as the consequences of applying the mixture on the estimation of content and response style traits. We argue that methodology applied to investigate response styles should attend to the inherent uncertainty of response style influence due to the likely influence of both response styles and the content trait on the selection of extreme response categories. 相似文献

2.

Damazo T. Kadengye Eva Ceulemans Wim Van Den Noortgate 《Journal of Experimental Education》2015,83(2):175-202

In educational environments, monitoring persons' progress over time may help teachers to evaluate the effectiveness of their teaching procedures. Electronic learning environments are increasingly being used as part of formal education and resulting datasets can be used to understand and to improve the environment. This study presents longitudinal models based on the item response theory (IRT) for measuring persons' ability within and between study sessions in data from web-based learning environments. Two empirical examples are used to illustrate the presented models. Results show that by incorporating time spent within- and between-study sessions into an IRT model; one is able to track changes in ability of a population of persons or for groups of persons at any time of the learning process. 相似文献

3.

Janke M. Faber Cees A. W. Glas Adrie J. Visscher 《School Effectiveness & School Improvement》2018,29(1):43-63

In this study, the relationship between differentiated instruction, as an element of data-based decision making, and student achievement was examined. Classroom observations (n = 144) were used to measure teachers’ differentiated instruction practices and to predict the mathematical achievement of 2nd- and 5th-grade students (n = 953). The analysis of classroom observation data was based on a combination of generalizability theory and item response theory, and student achievement effects were determined by means of multilevel analysis. No significant positive effects were found for differentiated instruction practices. Furthermore, findings showed that students in low-ability groups profited less from differentiated instruction than students in average or high-ability groups. Nevertheless, the findings, data collection, and data-analysis procedures of this study contribute to the study of classroom observation and the measurement of differentiated instruction. 相似文献

4.

Sun‐Joo Cho Youngsuk Suh Woo‐yeol Lee 《Educational Measurement》2016,35(1):48-61

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data. 相似文献

5.

Chun Wang Nidhi Kohli Lisa Henn 《Structural equation modeling》2016,23(3):455-465

Measuring academic growth, or change in aptitude, relies on longitudinal data collected across multiple measurements. The National Educational Longitudinal Study (NELS:88) is among the earliest, large-scale, educational surveys tracking students’ performance on cognitive batteries over 3 years. Notable features of the NELS:88 data set, and of almost all repeated measures educational assessments, are (a) the outcome variables are binary or at least categorical in nature; and (b) a set of different items is given at each measurement occasion with a few anchor items to fix the measurement scale. This study focuses on the challenges related to specifying and fitting a second-order longitudinal model for binary outcomes, within both the item response theory and structural equation modeling frameworks. The distinctions between and commonalities shared between these two frameworks are discussed. A real data analysis using the NELS:88 data set is presented for illustration purposes. 相似文献

6.

Allison J. Ames Aaron J. Myers 《Educational and psychological measurement》2021,81(4):756

Contamination of responses due to extreme and midpoint response style can confound the interpretation of scores, threatening the validity of inferences made from survey responses. This study incorporated person-level covariates in the multidimensional item response tree model to explain heterogeneity in response style. We include an empirical example and two simulation studies to support the use and interpretation of the model: parameter recovery using Markov chain Monte Carlo (MCMC) estimation and performance of the model under conditions with and without response styles present. Item intercepts mean bias and root mean square error were small at all sample sizes. Item discrimination mean bias and root mean square error were also small but tended to be smaller when covariates were unrelated to, or had a weak relationship with, the latent traits. Item and regression parameters are estimated with sufficient accuracy when sample sizes are greater than approximately 1,000 and MCMC estimation with the Gibbs sampler is used. The empirical example uses the National Longitudinal Study of Adolescent to Adult Health’s sexual knowledge scale. Meaningful predictors associated with high levels of extreme response latent trait included being non-White, being male, and having high levels of parental support and relationships. Meaningful predictors associated with high levels of the midpoint response latent trait included having low levels of parental support and relationships. Item-level covariates indicate the response style pseudo-items were less easy to endorse for self-oriented items, whereas the trait of interest pseudo-items were easier to endorse for self-oriented items. 相似文献

7.

关于心理测验理论模式的比较 总被引：1，自引：0，他引：1

赫云鹏王俊秀《内蒙古师范大学学报(哲学社会科学版)》1997,(4)

真分数理论和项目反应理论是心理测验的两大理论模式。真分数理论主要是估计真分数和实得分数之间关系的;项目反应理论是将被试对单个测验项目的某种反应概率与此项目的一定特征联系起来,项目反应理论可以说是在真分数理论基础上的一种发展,但绝不是真分数理论,两者所建立的理论的基本假设不同,并各有其优势与不足。今天的心理测验就是在这两大理论共存的情况下,互相促进、互相补充,并在此基础之上向更合理、更完善的方向发展相似文献

8.

Eun Sook Kim Chunhua Cao Yan Wang Diep T. Nguyen 《Structural equation modeling》2017,24(4):524-544

With the increasing use of international survey data especially in cross-cultural and multinational studies, establishing measurement invariance (MI) across a large number of groups in a study is essential. Testing MI over many groups is methodologically challenging, however. We identified 5 methods for MI testing across many groups (multiple group confirmatory factor analysis, multilevel confirmatory factor analysis, multilevel factor mixture modeling, Bayesian approximate MI testing, and alignment optimization) and explicated the similarities and differences of these approaches in terms of their conceptual models and statistical procedures. A Monte Carlo study was conducted to investigate the efficacy of the 5 methods in detecting measurement noninvariance across many groups using various fit criteria. Generally, the 5 methods showed reasonable performance in identifying the level of invariance if an appropriate fit criterion was used (e.g., Bayesian information criteron with multilevel factor mixture modeling). Finally, general guidelines in selecting an appropriate method are provided. 相似文献

9.

In-Hee Choi Insu Paek Sun-Joo Cho 《Journal of Experimental Education》2017,85(3):411-424

The purpose of the current study is to examine the performance of four information criteria (Akaike's information criterion [AIC], corrected AIC [AICC] Bayesian information criterion [BIC], sample-size adjusted BIC [SABIC]) for detecting the correct number of latent classes in the mixture Rasch model through simulations. The simulation study manipulated various class-distinction features (percentages of class-variant items, magnitudes, and patterns of item difficulty differences) and mixing proportions, assuming that a mixture Rasch model with two latent classes was the true model. Unlike previous studies that showed BIC's superiority to other indices, our findings from this study suggested that the four information criteria had differential performance depending on the percentage of class-variant items and the magnitude and pattern of item difficulty differences under a two-class structure. Furthermore, the present study revealed that AICC and SABIC generally performed as good as or better than their counterparts, AIC and BIC, respectively, for the class-class structure with a sample of 3,000. 相似文献

10.

Minimizing the Influence of Item Parameter Estimation Errors in Test Development: A Comparison of Three Selection Procedures

Mark J. Gierl Dianne Henderson Michael Jodoin Don Klinger 《Journal of Experimental Education》2013,81(3):261-279

In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed. 相似文献

11.

基于项目反应理论的测验编制方法研究 总被引：3，自引：0，他引：3

戴海琦《考试研究》2006,(4)

本文在简单介绍项目反应理论的基础上,从计量分析的角度,深入探讨了应用项目反应理论编制各种测验的一般步骤;探讨了项目反应理论题库建设方法及基于题库的测验编制方法;探讨了标准参照测验合格分数线的划分方法。相似文献

12.

Silvia Bacci Francesco Bartolucci 《Structural equation modeling》2013,20(3):352-365

We propose a structural equation model, which reduces to a multidimensional latent class item response theory model, for the analysis of binary item responses with nonignorable missingness. The missingness mechanism is driven by 2 sets of latent variables: one describing the propensity to respond and the other referred to the abilities measured by the test items. These latent variables are assumed to have a discrete distribution, so as to reduce the number of parametric assumptions regarding the latent structure of the model. Individual covariates can also be included through a multinomial logistic parameterization for the distribution of the latent variables. Given the discrete nature of this distribution, the proposed model is efficiently estimated by the expectation–maximization algorithm. A simulation study is performed to evaluate the finite-sample properties of the parameter estimates. Moreover, an application is illustrated with data coming from a student entry test for the admission to some university courses. 相似文献

13.

CTT与IRT测量原理之比较

沐守宽《上海师范大学学报(哲学社会科学版)》2006,35(4):6-9

通过对经典测量理论与项目反应理论在基本假设、测验精度计量、测验的标准误以及测验项目的筛选等四个主要领域的比较,可以发现项目反应理论具有被试能力估计的项目选择独立性、项目难度参数与能力参数的刻度统一性、项目参数估计的样本独立性、估计测量误差的精确性等几个优点;但是在某些模型中存在单维性假设难以满足、测验条件要求严格以及数学模型简约性差等需要解决的问题。相似文献

14.

Kaiwen Man Jeffrey R. Harring 《Educational and psychological measurement》2021,81(3):441

Many approaches have been proposed to jointly analyze item responses and response times to understand behavioral differences between normally and aberrantly behaved test-takers. Biometric information, such as data from eye trackers, can be used to better identify these deviant testing behaviors in addition to more conventional data types. Given this context, this study demonstrates the application of a new method for multiple-group analysis that concurrently models item responses, response times, and visual fixation counts collected from an eye-tracker. It is hypothesized that differences in behavioral patterns between normally behaved test-takers and those who have different levels of preknowledge about the test items will manifest in latent characteristics of the different data types. A Bayesian estimation scheme is used to fit the proposed model to experimental data and the results are discussed. 相似文献

15.

The Examination of the Classification of Students into Performance Categories by Two Different Equating Methods

Lisa A. Keller Robert R. Keller Pauline A. Parker 《Journal of Experimental Education》2013,81(1):30-52

This study investigates the comparability of two item response theory based equating methods: true score equating (TSE), and estimated true equating (ETE). Additionally, six scaling methods were implemented within each equating method: mean-sigma, mean-mean, two versions of fixed common item parameter, Stocking and Lord, and Haebara. Empirical test data were examined to investigate the consistency of scores resulting from the two equating methods, as well as the consistency of the scaling methods both within equating methods and across equating methods. Results indicate that although the degree of correlation among the equated scores was quite high, regardless of equating method/scaling method combination, non-trivial differences in equated scores existed in several cases. These differences would likely accumulate across examinees making group-level differences greater. Systematic differences in the classification of examinees into performance categories were observed across the various conditions: ETE tended to place lower ability examinees into higher performance categories than TSE, while the opposite was observed for high ability examinees. Because the study was based on one set of operational data, the generalizability of the findings is limited and further study is warranted. 相似文献

16.

Stefanie A. Wind 《Educational Measurement》2017,36(2):50-66

Mokken scale analysis (MSA) is a probabilistic‐nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic‐nonparametric framework in which to explore measurement quality, with an emphasis on its application in the context of educational assessment. The module describes both dichotomous and polytomous formulations of the MSA model. Examples of the application of MSA to educational assessment are provided using data from a multiple‐choice physical science assessment and a rater‐mediated writing assessment. 相似文献

17.

计算机信息技术课无纸化考试的研究 总被引：1，自引：0，他引：1

朱小明李向荣林捷赵锦红《中国教育技术装备》2007,(1):11-14

介绍考试理论从经典测量到项目反应的发展,指出计算机化考试的必然性和优越性。对计算机考试如何在多媒体网络实验室实现,进行了较详细的阐述。相似文献

18.

Shiyang Su Chun Wang David J. Weiss 《Educational and psychological measurement》2021,81(3):491

S - χ^{2}

is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of

S - χ^{2}

for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of

S - χ^{2}

under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using

S - χ^{2}

within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test.

S - χ^{2}

performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of

S - χ^{2}

were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of

S - χ^{2}

was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them. 相似文献

19.

基于题目反应理论的网络自适应考试

苏婕《天津职业院校联合学报》2007,9(5):106-109

随着计算机的普及、网络的发展、教学和考试测评理论的更新,一种基于题目反应理论的计算机自适应考试已经越来越普及,它以其题目适应不同能力学生水平自动变化的特点,已经被越来越多的考试所采用,针对题目反应理论,需要对自适应考试实现等问题加以论述。相似文献

20.

M. Lee Van Horn Yuling Feng Minjung Kim Andrea Lamont Daniel Feaster Thomas Jaki 《Structural equation modeling》2016,23(2):259-269

This article proposes a novel exploratory approach for assessing how the effects of Level-2 predictors differ across Level-1 units. Multilevel regression mixture models are used to identify latent classes at Level 1 that differ in the effect of 1 or more Level-2 predictors. Monte Carlo simulations are used to demonstrate the approach with different sample sizes and to demonstrate the consequences of constraining 1 of the random effects to 0. An application of the method to evaluate heterogeneity in the effects of classroom practices on students is used to show the types of research questions that can be answered with this method and the issues faced when estimating multilevel regression mixtures. 相似文献