首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。  相似文献   

2.
Several recent papers have argued for the usefulness of item response theory (IRT) methods of assessing item discrimination power for criterion-referenced tests (CRTs). Conventional methods continue to be used more widely, however, for reasons that include some practical constraints associated with the use of IRT methods. To provide users with information that may help them to decide on which conventional indices to employ in evaluating CRT items, Spearman rank-order correlations were computed between IRT-derived item information functions (llFs) and four conventional discrimination indices: the phi-coefficient, the B-index, phi/phi max, and the agreement statistic. The rank-order correlations between the phi-coefficient and the llFs were very high, with a median of .96. The remaining conventional indices, with the exception of phi-over-phi-max, also correlated well with the IIF. Theoretical explanations for these relationships are offered.  相似文献   

3.
4.
Numerous assessments contain a mixture of multiple choice (MC) and constructed response (CR) item types and many have been found to measure more than one trait. Thus, there is a need for multidimensional dichotomous and polytomous item response theory (IRT) modeling solutions, including multidimensional linking software. For example, multidimensional item response theory (MIRT) may have a promising future in subscale score proficiency estimation, leading toward a more diagnostic orientation, which requires the linking of these subscale scores across different forms and populations. Several multidimensional linking studies can be found in the literature; however, none have used a combination of MC and CR item types. Thus, this research explores multidimensional linking accuracy for tests composed of both MC and CR items using a matching test characteristic/response function approach. The two-dimensional simulation study presented here used real data-derived parameters from a large-scale statewide assessment with two subscale scores for diagnostic profiling purposes, under varying conditions of anchor set lengths (6, 8, 16, 32, 60), across 10 population distributions, with a mixture of simple versus complex structured items, using a sample size of 3,000. It was found that for a well chosen anchor set, the parameters recovered well after equating across all populations, even for anchor sets composed of as few as six items.  相似文献   

5.
非参数项目反应理论模型包括单调均匀性模型和双单调模型。用单调均匀性模型对某英语听力考试结果研究发现,按照顺序选择法,可从16道听力试题中选出11道满足要求的试题,组成单维量表。用考生在这11道试题上的总得分对考生进行排序与按照潜质排序等效。利用双单调模型对11道听力试题组成的单维量表进行试题功能偏差研究发现,有5道试题在女生子群体中的排序与在男生子群体以及整个群体排序不同,显示女生子群体作出正确应答的概率明显高于男生子群体作出正确应答的概率。这种差异至少部分是由两个子群体听力能力上的差异引起的。  相似文献   

6.
对IRT模型应用中须注意的几个重要问题做了分析与探讨,包括模型的假设,模型的拟合,模型对样本大小的要求等,虽然分析与探讨这些问题时主要以单维参数IRT模型为焦点,但非参数IRT模型以及多维IRT模型的应用也同样涉及这些问题。这些模型拥有某些特别的优势,在某些情况下应用是恰当的,但在健康评估领域并不是全部适用。  相似文献   

7.
Multilevel bifactor item response theory (IRT) models are commonly used to account for features of the data that are related to the sampling and measurement processes used to gather those data. These models conventionally make assumptions about the portions of the data structure that represent these features. Unfortunately, when data violate these models' assumptions but these models are used anyway, incorrect conclusions about the cluster effects could be made and potentially relevant dimensions could go undetected. To address the limitations of these conventional models, a more flexible multilevel bifactor IRT model that does not make these assumptions is presented, and this model is based on the generalized partial credit model. Details of a simulation study demonstrating this model outperforming competing models and showing the consequences of using conventional multilevel bifactor IRT models to analyze data that violate these models' assumptions are reported. Additionally, the model's usefulness is illustrated through the analysis of the Program for International Student Assessment data related to interest in science.  相似文献   

8.
采用随机整群抽样抽取505名中小学教师作为被试,其中,男教师189名,女教师271名,年龄均在25至55岁之间。采用教学效能感问卷进行施测,基于项目反应理论,对测试结果进行分析,得出所有项目的区分度、难度和项目信息峰值,参考项目区分度、难度及项目信息函数峰值对教学效能感量表做了修订,再运用结构方程模型、层面理论技术和最小空间分析对修订后的量表进行质量检验,结果表明修订后的量表测量拥有更为清晰的结构效度和更高的信度,测量更为精确。运用SPSS15.0管理数据,运用Hudap6.0和MULTILOG 7.03分析数据,研究得出如下五个结论:1)教学效能感量表为单一维度,可以使用项目反应理论进行分析;2)修订后的量表项目的区分度、难度更为合理;3)修订后的量表的测验信息峰值较原量表稍低;4)修订前后量表对应层面元素之间存在高相关;5)量表的三个方面内容结构得以证实,即学生品德行为教育、课堂组织管理和知识传授。  相似文献   

9.
叶萌 《考试研究》2010,(2):96-107
本文对项目反应理论(IRT)局部独立性问题的主要研究成果进行了文献梳理。在此基础上,阐释局部独立性假设的定义。文章同时就局部独立性与测验维度的关系,局部依赖的甄别与计算、起因和控制程序,以及局部依赖对测量实践的影响进行讨论,并探讨了题组中局部题目依赖问题的解决策略。  相似文献   

10.
张军 《考试研究》2014,(1):56-61
单调匀质模型是非参数项目反应理论中使用最广泛的模型,它有三个基本假设,适用于小规模测验的分析。本研究使用MHM分析北京语言大学汉语进修学院某次测验,结果表明测验满足弱单维性假设与弱局部独立性假设,67个项目中有9个项目的量表适宜性系数低于0.3,需要修改或删除,删除后测验为中等强度的Mokken量表。另外,有2个项目违反了单调性假设,不符合Mokken量表的要求。  相似文献   

11.
Adjusting the Cumulative GPA Using Item Response Theory   总被引:1,自引:0,他引:1  
In college admissions, the predictive validity of preadmissions measures such as standardized test scores and high school grades is of wide interest. These measures are most often validated against the criterion of the first-year grade point average (GPA). However, neither the first-year GPA nor the four-year cumulative GPA is an adequate indicator of academic performance through four years of college. In this study, Item Response Theory (IRT) is used to develop a more reliable measure of performance, called an IRT-based GPA, which is used to estimate the validity of traditional preadmissions information. The data are preadmissions information and course grades for the Class of 1986 at Stanford University (N = 1564). Principal factor analysis is used as a precursor to determine the dimensionality of the course data and to partition courses into approximately unidimensional subsets, each of which is scaled independently. Results show a substantial increase in predictability when the IRT-based GPA is used instead of the usual GPA.  相似文献   

12.
How did early work of Binet and Thurstone foreshadow item response theory? What connections to work in other areas are not widely recognized? What are current application trends?  相似文献   

13.
用项目反应理论编制标准化考试题库系统,旨在探究教育测量的新理论应用,为学科教育评价提供辅助工具。题库以四年级数学为测验内容,按标准化编制项目施测、评分、分析,其中选用三参数的逻辑斯蒂克模型,并用贝佳法检验了测验的单纬性,利用ANOTE软件估计项目参数,用基于题目分类下的经验判断法建立一个划界分数,接合项目信息量组成题库,最后讨论了能力估计方法、抽题策略、结束控制等题库编制的关键技术。  相似文献   

14.
The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article introduces the effort-moderated IRT model, which incorporates item response time into proficiency estimation and item parameter estimation. In two studies of the effort-moderated model when rapid guessing (i.e., reflecting low examinee effort) was present, one based on real data and the other on simulated data, the effort-moderated model performed better than the standard 3PL model. Specifically, it was found that the effort-moderated model (a) showed better model fit, (b) yielded more accurate item parameter estimates, (c) more accurately estimated test information, and (d) yielded proficiency estimates with higher convergent validity.  相似文献   

15.
项目反应理论在大规模选拔性考试试题质量评价中具有经典测量理论所不具备的诸多优势,在国内外得到越来越多的应用。按照分析常模参照性测验的程序和方法,应用项目反应理论对贵阳市2011年高三英语模拟考试试题命题质量进行分析,再次证明了项目反应理论分析测验质量具有项目参数跨样本不变性、对被试特质水平的估计不受测验项目影响等优点,在基础教育考试命题工作中具有重要的价值与应用前景。  相似文献   

16.
论项目反应理论   总被引:2,自引:0,他引:2  
本文就项目反应理论产生的历史背景,发展史及其特点和在教育、心理测量上的应用等方面进行了讨论,提出了信度的理论问题和它的若干模型。  相似文献   

17.
Testlet effects can be taken into account by incorporating specific dimensions in addition to the general dimension into the item response theory model. Three such multidimensional models are described: the bi-factor model, the testlet model, and a second-order model. It is shown how the second-order model is formally equivalent to the testlet model. In turn, both models are constrained bi-factor models. Therefore, the efficient full maximum likelihood estimation method that has been established for the bi-factor model can be modified to estimate the parameters of the two other models. An application on a testlet-based international English assessment indicated that the bi-factor model was the preferred model for this particular data set.  相似文献   

18.
Methods are presented for comparing grades obtained in a situation where students can choose between different subjects. It must be expected that the comparison between the grades is complicated by the interaction between the students' pattern and level of proficiency on one hand, and the choice of the subjects on the other hand. Three methods based on item response theory (IRT) for the estimation of proficiency measures that are comparable over students and subjects are discussed: a method based on a model with a unidimensional representation of proficiency, a method based on a model with a multidimensional representation of proficiency, and a method based on a multidimensional representation of proficiency where the stochastic nature of the choice of examination subjects is explicitly modeled. The methods are compared using the data from the Central Examinations in Secondary Education in the Netherlands. The results show that the unidimensional IRT model produces unrealistic results, which do not appear when using the two multidimensional IRT models. Further, it is shown that both the multidimensional models produce acceptable model fit. However, the model that explicitly takes the choice process into account produces the best model fit.  相似文献   

19.
本研究基于项目反应理论,探索题目变动的公开招聘考试的最优题型。利用《北京市新进人员通用能力考试》专业技术岗位1 000名考生成绩,通过探索性因素分析保证仅包含一个维度的情况下,使用项目反应理论等级反应模型分析10个题型的性能。先将各个题型不同题目的得分加和,将不同得分的频数转换为等级,分别计算区分度、难度、类别反应曲线和信息函数。最优题型用两种方法确定,一是选取信息量占比高于均值的题型,二是排除各种参数达不到常用标准的题型。两种方法得到非常接近的结果,即逻辑推理、图表解读、短文加工、阅读理解四个题型最优。  相似文献   

20.
文章采用项目反应理论中的两参数正态双卵模型,利用MCMC的方法,给出了Gibbs抽样估计项目参数的Matlab程序,根据该程序对某校本科生的期末成绩数据进行运算得出了项目参数,并加以分析。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号