首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
The Remote Associates Test (RAT) developed by Mednick and Mednick (1967) is known as a valid measure of creative convergent thinking. We developed a 30-item version of the RAT in Dutch with high internal consistency (Cronbach's alpha = 0.85) and applied both Classical Test Theory and Item Response Theory (IRT) to provide measures of item difficulty and discriminability, construct validity, and reliability. IRT was further used to construct a shorter version of the RAT, which comprises of 22 items but still shows good reliability and validity—as revealed by its relation to Raven's Advanced Progressive Matrices test, another insight-problem test, and Guilford's Alternative Uses Test.  相似文献   

2.
《教育实用测度》2013,26(2):125-141
Item parameter instability can threaten the validity of inferences about changes in student achievement when using Item Response Theory- (IRT) based test scores obtained on different occasions. This article illustrates a model-testing approach for evaluating the stability of IRT item parameter estimates in a pretest-posttest design. Stability of item parameter estimates was assessed for a random sample of pretest and posttest responses to a 19-item math test. Using MULTILOG (Thissen, 1986), IRT models were estimated in which item parameter estimates were constrained to be equal across samples (reflecting stability) and item parameter estimates were free to vary across samples (reflecting instability). These competing models were then compared statistically in order to test the invariance assumption. The results indicated a moderately high degree of stability in the item parameter estimates for a group of children assessed on two different occasions.  相似文献   

3.
This study compares the psychometric utility of Classical Test Theory (CTT) and Item Response Theory (IRT) for scale construction with data from higher education student surveys. Using 2008 Your First College Year (YFCY) survey data from the Cooperative Institutional Research Program at the Higher Education Research Institute at UCLA, two scales are built and tested—one measuring social involvement and one measuring academic involvement. Findings indicate that although both CTT and IRT can be used to obtain the same information about the extent to which scale items tap into the latent trait being measured, the two measurement theories provide very different pictures of scale precision. On the whole, IRT provides much richer information about measurement precision as well as a clearer roadmap for scale improvement. The findings support the use of IRT for scale construction and survey development in higher education.  相似文献   

4.
《Educational Assessment》2013,18(4):329-347
It is generally accepted that variability in performance will increase throughout Grades 1 to 12. Those with minimal knowledge of a domain should vary but little, but, as learning rates differ, variability should increase as a function of growth. In this article, the series of reading tests from a widely used test battery for Grades 1 through 12 was singled out for study as the scale scores for the series have the opposite characteristic-that is, variability is greatest at Grade 1 and decreases as growth proceeds. Item response theory (IRT) scaling was used; in previous editions, the publisher had used Thurstonian scaling and the variance increased with growth. Using data with known characteristics (i.e., weight distributions for ages 6 through 17), a comparison was made between the effectiveness of IRT and Thurstonian scaling procedures. The Thurstonian scaling more accurately reproduced the characteristics of the known distributions. As IRT scaling was shown to improve when perfect scores were included in the analyses and when items were selected whose difficulties reflected the entire range of ability, these steps were recommended. However, even when these steps were implemented with IRT, the Thurstonian scaling was still found to be more accurate.  相似文献   

5.
School climate surveys are widely applied in school districts across the nation to collect information about teacher efficacy, principal leadership, school safety, students' activities, and so forth. They enable school administrators to understand and address many issues on campus when used in conjunction with other student and staff data. However, these days each district develops the questionnaire according to its own needs and rarely provides supporting evidence for the reliability of items in the scale, that is, whether an individual item contributes significant information to the questionnaire. The Item Response Theory (IRT) is a useful tool that helps examine how much information each item and the whole scale can provide. Our study applied IRT to examine individual items in a school climate survey and assessed the efficiency of the survey after the removal of items that contributed little to the scale. The purpose of this study is to show how IRT can be applied to empirically validate school climate surveys.  相似文献   

6.
As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's α, Feldt‐Raju, stratified α, and marginal reliability). Models with different underlying assumptions concerning test‐part similarity are discussed. A detailed computational example is presented for the targeted coefficients. A comparison of the IRT model‐derived coefficients is made and the impact of varying ability distributions is evaluated. The advantages of IRT‐derived reliability coefficients for problems such as automated test form assembly and vertical scaling are discussed.  相似文献   

7.
Item response models are finding increasing use in achievement and aptitude test development. Item response theory (IRT) test development involves the selection of test items based on a consideration of their item information functions. But a problem arises because item information functions are determined by their item parameter estimates, which contain error. When the "best" items are selected on the basis of their statistical characteristics, there is a tendency to capitalize on chance due to errors in the item parameter estimates. The resulting test, therefore, falls short of the test that was desired or expected. The purposes of this article are (a) to highlight the problem of item parameter estimation errors in the test development process, (b) to demonstrate the seriousness of the problem with several simulated data sets, and (c) to offer a conservative solution for addressing the problem in IRT-based test development.  相似文献   

8.
The U.S. government has become increasingly focused on school climate, as recently evidenced by its inclusion as an accountability indicator in the Every Student Succeeds Act. Yet, there remains considerable variability in both conceptualizing and measuring school climate. To better inform the research and practice related to school climate and its measurement, we leveraged item response theory (IRT), a commonly used psychometric approach for the design of achievement assessments, to create a parsimonious measure of school climate that operates across varying individual characteristics. Students (n = 69,513) in 111 secondary schools completed a school climate assessment focused on three domains of climate (i.e., safety, engagement, and environment), as defined by the U.S. Department of Education. Item and test characteristics were estimated using the mirt package in R using unidimensional IRT. Analyses revealed measurement difficulties that resulted in a greater ability to assess less favorable perspectives on school climate. Differential item functioning analyses indicated measurement differences based on student academic success. These findings support the development of a broad measure of school climate but also highlight the importance of work to ensure precision in measuring school climate, particularly when considering use as an accountability measure.  相似文献   

9.
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA and CC nonparametrically by replacing the role of the parametric IRT model in Lee's classification indices with a modified version of Ramsay's kernel‐smoothed item response functions. The performance of the nonparametric CA and CC indices are tested in simulation studies in various conditions with different generating IRT models, test lengths, and ability distributions. The nonparametric approach to CA often outperforms Lee's method and Livingston and Lewis's method, showing robustness to nonnormality in the simulated ability. The nonparametric CC index performs similarly to Lee's method and outperforms Livingston and Lewis's method when the ability distributions are nonnormal.  相似文献   

10.
潘浩 《考试研究》2014,(2):59-63
早期的单维IRT模型忽视了测验多维性的可能,而多维IRT模型对各维度的划分不够明确,不能很好地反映各维度能力的内涵。高阶IRT模型承认测验的多维性,以分测验划分维度,同时又将多个维度的能力统一到一个高阶的能力中,能够在了解被试各维度的能力同时,为被试提供整体的能力估计,它能更好地反映实际,并且适应大规模测验的需求。  相似文献   

11.
项目反应理论下的测验信度能够评价潜在特质估计的可靠性与稳定性,由于具有宏观性的特点,项目反应理论信度的作用并不能被测验信息函数所取代,是IRT测验的一个重要指标。本文参考国内外文献,首先介绍国内外学者关于IRT信度作用的观点,并介绍和评价了多种IRT信度估计方法,然后简要介绍IRT信度的影响因素,最后展望了IRT信度领域后续研究尚可着力之处。  相似文献   

12.
Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal conditions using four latent trait estimation procedures and also evaluated whether the test composition, in terms of item difficulty level, reduces estimation error. Most importantly, both true and estimated item parameters were examined to disentangle the effects of latent trait estimation error from item parameter estimation error. Results revealed that non-normal latent trait distributions produced a considerably larger degree of latent trait estimation error than normal data. Estimated item parameters tended to have comparable precision to true item parameters, thus suggesting that increased latent trait estimation error results from latent trait estimation rather than item parameter estimation.  相似文献   

13.
Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same time, a number of data imputation methods have been developed outside of the IRT framework and been shown to be effective tools for dealing with missing data. The current study takes several of these methods that have been found to be useful in other contexts and investigates their performance with IRT data that contain missing values. Through a simulation study, it is shown that these methods exhibit varying degrees of effectiveness in terms of imputing data that in turn produce accurate sample estimates of item difficulty and discrimination parameters.  相似文献   

14.
《教育实用测度》2013,26(4):291-312
This study compares three procedures for the detection of differential item functioning (DIF) under item response theory (IRT): (a) Lord's chi-square, (b) Raju's area measures, and (c) the likelihood ratio test. Relations among the three procedures and some practical considerations, such as linking metrics and scale purification, are discussed. Data from two forms of a university mathematics placement test were analyzed to examine the congruence among the three procedures. Results indicated that there was close agreement among the three DIF detection procedures.  相似文献   

15.
Two new methods for item exposure control were proposed. In the Progressive method, as the test progresses, the influence of a random component on item selection is reduced and the importance of item information is increasingly more prominent. In the Restricted Maximum Information method, no item is allowed to be exposed in more than a predetermined proportion of tests. Both methods were compared with six other item-selection methods (Maximum Information, One Parameter, McBride and Martin, Randomesque, Sympson and Hetter, and Random Item Selection) with regard to test precision and item exposure variables. Results showed that the Restricted method was useful to reduce maximum exposure rates and that the Progressive method reduced the number of unused items. Both did well regarding precision. Thus, a combined Progressive-Restricted method may be useful to control item exposure without a serious decrease in test precision.  相似文献   

16.
The Progressive Matrices items require varying degrees of analytical reasoning. Individuals high on the underlying trait measured by the Raven should score high on the test. Latent trait models applied to data of the Raven form provide a useful methodology for examining the tenability of the above hypothesis. In this study the Rasch latent model was applied to investigate the fit of observed performance on Raven items to what was expected by the model for individuals at six different levels of the underlying scale. For the most part the model showed a good fit to the test data. The findings were similar to previous empirical work that has investigated the behavior of Rasch test scores. In three instances, however, the item fit statistic was relatively large. A closer study of the “misfitting” items revealed two items were of extreme difficulty, which is likely to contribute to the misfit. The study raises issues about the use of the Rasch model in instances of small samples. Other issues related to the interpretation of the Rasch model to Raven-type data are discussed.  相似文献   

17.
2PLM下CAT选题策略比较   总被引:1,自引:0,他引:1  
本文在两参数逻辑斯蒂克模型(2PLM)下,提出一种新的选题策略——平均测验难度匹配法(Avt—b),并对四种选题策略下EAP能力估计趋势进行比较研究。通过模拟研究显示,Avt—b方法在CAT前期能够较快地锁定能力范围,较准确地作出能力估计。本文对CAT测试阶段的能力误差范围进行确定,对于多级评分模型的CAT选题策略开发具有一定的借鉴意义。  相似文献   

18.
An item-preequating design and a random groups design were used to equate forms of the American College Testing (ACT) Assessment Mathematics Test. Equipercentile and 3-parameter logistic model item-response theory (IRT) procedures were used for both designs. Both pretest methods produced inadequate equating results, and the IRT item preequating method resulted in more equating error than had no equating been conducted. Although neither of the item preequating methods performed well, the results from the equipercentile preequating method were more consistent with those from the random groups method than were the results from the IRT item pretest method. Item context and position effects were likely responsible, at least in part, for the inadequate results for item preequating. Such effects need to be either controlled or modeled, and the design further researched before the item preequating design can be recommended for operational use.  相似文献   

19.
In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed.  相似文献   

20.
Item analysis is an integral part of operational test development and is typically conducted within two popular statistical frameworks: classical test theory (CTT) and item response theory (IRT). In this digital ITEMS module, Hanwook Yoo and Ronald K. Hambleton provide an accessible overview of operational item analysis approaches within these frameworks. They review the different stages of test development and associated item analyses to identify poorly performing items and effective item selection. Moreover, they walk through the computational and interpretational steps for CTT‐ and IRT‐based evaluation statistics using simulated data examples and review various graphical displays such as distractor response curves, item characteristic curves, and item information curves. The digital module contains sample data, Excel sheets with various templates and examples, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号