首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed.  相似文献   

2.
    
The purpose of the current study is to examine the performance of four information criteria (Akaike's information criterion [AIC], corrected AIC [AICC] Bayesian information criterion [BIC], sample-size adjusted BIC [SABIC]) for detecting the correct number of latent classes in the mixture Rasch model through simulations. The simulation study manipulated various class-distinction features (percentages of class-variant items, magnitudes, and patterns of item difficulty differences) and mixing proportions, assuming that a mixture Rasch model with two latent classes was the true model. Unlike previous studies that showed BIC's superiority to other indices, our findings from this study suggested that the four information criteria had differential performance depending on the percentage of class-variant items and the magnitude and pattern of item difficulty differences under a two-class structure. Furthermore, the present study revealed that AICC and SABIC generally performed as good as or better than their counterparts, AIC and BIC, respectively, for the class-class structure with a sample of 3,000.  相似文献   

3.
计算机信息技术课无纸化考试的研究   总被引:1,自引:0,他引:1  
介绍考试理论从经典测量到项目反应的发展,指出计算机化考试的必然性和优越性。对计算机考试如何在多媒体网络实验室实现,进行了较详细的阐述。  相似文献   

4.
基于项目反应理论的测验编制方法研究   总被引:3,自引:0,他引:3  
本文在简单介绍项目反应理论的基础上,从计量分析的角度,深入探讨了应用项目反应理论编制各种测验的一般步骤;探讨了项目反应理论题库建设方法及基于题库的测验编制方法;探讨了标准参照测验合格分数线的划分方法。  相似文献   

5.
    
Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.  相似文献   

6.
通过对经典测量理论与项目反应理论在基本假设、测验精度计量、测验的标准误以及测验项目的筛选等四个主要领域的比较,可以发现项目反应理论具有被试能力估计的项目选择独立性、项目难度参数与能力参数的刻度统一性、项目参数估计的样本独立性、估计测量误差的精确性等几个优点;但是在某些模型中存在单维性假设难以满足、测验条件要求严格以及数学模型简约性差等需要解决的问题。  相似文献   

7.
随着计算机的普及、网络的发展、教学和考试测评理论的更新,一种基于题目反应理论的计算机自适应考试已经越来越普及,它以其题目适应不同能力学生水平自动变化的特点,已经被越来越多的考试所采用,针对题目反应理论,需要对自适应考试实现等问题加以论述。  相似文献   

8.
    
This paper presents a mixture item response tree (IRTree) model for extreme response style. Unlike traditional applications of single IRTree models, a mixture approach provides a way of representing the mixture of respondents following different underlying response processes (between individuals), as well as the uncertainty present at the individual level (within an individual). Simulation analyses reveal the potential of the mixture approach in identifying subgroups of respondents exhibiting response behavior reflective of different underlying response processes. Application to real data from the Students Like Learning Mathematics (SLM) scale of Trends in International Mathematics and Science Study (TIMSS) 2015 demonstrates the superior comparative fit of the mixture representation, as well as the consequences of applying the mixture on the estimation of content and response style traits. We argue that methodology applied to investigate response styles should attend to the inherent uncertainty of response style influence due to the likely influence of both response styles and the content trait on the selection of extreme response categories.  相似文献   

9.
    
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data.  相似文献   

10.
Most growth models implicitly assume that test scores have been vertically scaled. What may not be widely appreciated are the different choices that must be made when creating a vertical score scale. In this paper empirical patterns of growth in student achievement are compared as a function of different approaches to creating a vertical scale. Longitudinal item‐level data from a standardized reading test are analyzed for two cohorts of students between Grades 3 and 6 and Grades 4 and 7 for the entire state of Colorado from 2003 to 2006. Eight different vertical scales were established on the basis of choices made for three key variables: Item Response Theory modeling approach, linking approach, and ability estimation approach. It is shown that interpretations of empirical growth patterns appear to depend upon the extent to which a vertical scale has been effectively “stretched” or “compressed” by the psychometric decisions made to establish it. While all of the vertical scales considered show patterns of decelerating growth across grade levels, there is little evidence of scale shrinkage.  相似文献   

11.
在总结目前考试测评系统不足的基础上提出科学有效的考试质量评价方法,并在该方法的指导下运用经典测试理论和项目反应理论设计考试质量在线测评系统。系统具有创建新表、成绩统计、质量分析以及基于经典测试理论下的信度和效度分析等功能,能够满足广大教师对考试质量分析的需求。  相似文献   

12.
This article examines whether the way that PISA models item outcomes in mathematics affects the validity of its country rankings. As an alternative to PISA methodology a two-parameter model is applied to PISA mathematics item data from Canada and Finland for the year 2012. In the estimation procedure item difficulty and dispersion parameters are allowed to differ across the two countries and samples are restricted to respondents who actually answered items in a mathematics cluster. Different normalizations for identifying the distribution parameters are also considered. The choice of normalization is shown to be crucial in guaranteeing certain invariance properties required by item response models. The ability scores obtained from the methods employed here are significantly higher for Finland, in sharp contrast to PISA results, which gave both countries very similar ranks in mathematics.  相似文献   

13.
    
The trend in mathematics achievement from preschool to kindergarten is studied with a longitudinal growth item response theory model. The three measurement occasions included the spring of preschool and the spring and fall of kindergarten. The growth trend was nonlinear, with a steep drop between spring of preschool and fall of kindergarten. The modeling results provide validation for the argument that a classroom assessment in mathematics can be used to assess developmental skill levels that are consistent with a theory of early mathematics acquisition. The statistical model employed enables an effective illustration of overall gains and individual variability. Implications of the summer loss are discussed as well as model limitations.  相似文献   

14.
15.
    
Many educational and psychological tests are inherently multidimensional, meaning these tests measure two or more dimensions or constructs. The purpose of this module is to illustrate how test practitioners and researchers can apply multidimensional item response theory (MIRT) to understand better what their tests are measuring, how accurately the different composites of ability are being assessed, and how this information can be cycled back into the test development process. Procedures for conducting MIRT analyses–from obtaining evidence that the test is multidimensional, to modeling the test as multidimensional, to illustrating the properties of multidimensional items graphically-are described from both a theoretical and a substantive basis. This module also illustrates these procedures using data from a ninth-grade mathematics achievement test. It concludes with a discussion of future directions in MIRT research.  相似文献   

16.
    
Applying item response theory models to repeated observations has demonstrated great promise in developmental research. By allowing the researcher to take account of the characteristics of both item response and measurement error in longitudinal trajectory analysis, it improves the reliability and validity of latent growth curve analysis. This has enabled the study, to differentially weigh individual items and examine developmental stability and change over time, to propose a comprehensive modeling framework, combining a measurement model with a structural model. Despite a large number of components requiring attention, this study focuses on model formulation, evaluates the performance of the estimators of model parameters, incorporates prior knowledge from Bayesian analysis, and applies the model using an illustrative example. It is hoped that this fundamental study can demonstrate the breadth of this unified latent growth curve model.  相似文献   

17.
    
Contamination of responses due to extreme and midpoint response style can confound the interpretation of scores, threatening the validity of inferences made from survey responses. This study incorporated person-level covariates in the multidimensional item response tree model to explain heterogeneity in response style. We include an empirical example and two simulation studies to support the use and interpretation of the model: parameter recovery using Markov chain Monte Carlo (MCMC) estimation and performance of the model under conditions with and without response styles present. Item intercepts mean bias and root mean square error were small at all sample sizes. Item discrimination mean bias and root mean square error were also small but tended to be smaller when covariates were unrelated to, or had a weak relationship with, the latent traits. Item and regression parameters are estimated with sufficient accuracy when sample sizes are greater than approximately 1,000 and MCMC estimation with the Gibbs sampler is used. The empirical example uses the National Longitudinal Study of Adolescent to Adult Health’s sexual knowledge scale. Meaningful predictors associated with high levels of extreme response latent trait included being non-White, being male, and having high levels of parental support and relationships. Meaningful predictors associated with high levels of the midpoint response latent trait included having low levels of parental support and relationships. Item-level covariates indicate the response style pseudo-items were less easy to endorse for self-oriented items, whereas the trait of interest pseudo-items were easier to endorse for self-oriented items.  相似文献   

18.
    
Item analysis is an integral part of operational test development and is typically conducted within two popular statistical frameworks: classical test theory (CTT) and item response theory (IRT). In this digital ITEMS module, Hanwook Yoo and Ronald K. Hambleton provide an accessible overview of operational item analysis approaches within these frameworks. They review the different stages of test development and associated item analyses to identify poorly performing items and effective item selection. Moreover, they walk through the computational and interpretational steps for CTT‐ and IRT‐based evaluation statistics using simulated data examples and review various graphical displays such as distractor response curves, item characteristic curves, and item information curves. The digital module contains sample data, Excel sheets with various templates and examples, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

19.
    
Background:?Although on-demand testing is being increasingly used in many areas of assessment, it has not been adopted in high stakes examinations like the General Certificate of Secondary Education (GCSE) and General Certificate of Education Advanced level (GCE A level) offered by awarding organisations (AOs) in the UK. One of the major issues with on-demand testing is that some of the methods used for maintaining the comparability of standards over time in conventional testing are no longer available and the development of new methods is required.

Purpose:?This paper proposes an item response theory (IRT) framework for implementing on-demand testing and maintaining the comparability of standards over time for general qualifications, including GCSEs and GCE A levels, in the UK and discusses procedures for its practical implementation.

Sources of evidence:?Sources of evidence include literature from the fields of on-demand testing, the design of computer-based assessment, the development of IRT, and the application of IRT in educational measurement.

Main argument:?On-demand testing presents many advantages over conventional testing. In view of the nature of general qualifications, including the use of multiple components and multiple question types, the advances made in item response modelling over the past 30 years, and the availability of complex IRT analysis software systems, coupled with increasing IRT expertise in awarding organisations, IRT models could be used to implement on-demand testing in high stakes examinations in the UK. The proposed framework represents a coherent and complete approach to maintaining standards in on-demand testing. The procedures for implementing the framework discussed in the paper could be adapted by people to suit their own needs and circumstances.

Conclusions:?The use of IRT to implement on-demand testing could prove to be one of the viable approaches to maintaining standards over time or between test sessions for UK general qualifications.  相似文献   

20.
本研究旨在从一维和多维的角度检测国际教育成效评价协会(IEA)儿童认知发展状况测验中中译英考题的项目功能差异(DIF)。我们分析的数据由871名中国儿童和557名美国儿童的测试数据组成。结果显示,有一半以上的题目存在实质的DIF,意味着这个测验对于中美儿童而言,并没有功能等值。使用者应谨慎使用该跨语言翻译的比较测试结果来比较中美两国考生的认知能力水平。所幸约有半数的DIF题目偏向中国,半数偏向美国,因此利用测验总分所建立的量尺,应该不至于有太大的偏误。此外,题目拟合度统计量并不能足够地检测到存在DIF的题目,还是应该进行特定的DIF分析。我们探讨了三种可能导致DIF的原因,尚需更多学科专业知识和实验来真正解释DIF的形成。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号