首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
七十年代以来,项目反应理论(ItemResponse Theory,IRT)成了测量专家关心的主要课题之一。IRT中单参数Log-istic模型常称为Rasch模型,它是由丹麦数学家Georg Rasch沿着与其他项目反应模型非常不同的路线推导出来的本文旨在介绍Rasch模型在实际中的一些应用和一种模型参数的估计方法。这种方法可以借助于手算完成,从而使普通中学老师也可以作一些IRT的题目分析工作。一、模型及其应用IRT理论认为,潜在能力测量模型至少应该包括被测对象(考生)的行为反应与潜在能力的度量。前者是可观察的,后者是待估计的。Rasch模型可以表示为  相似文献   

2.
采用多面Rasch模型,以913名高中生为研究对象,从被试、评分员、任务和评分标准四个层面对外语写作思辨能力评价进行效度验证.研究结果显示:(1)包含提出问题、表达观点、提供证据、推理论证、得出结论、阐释评价的评价框架符合多面Rasch模型的测评要求,能体现并合理区分被试的外语写作思辨能力.(2)推理论证和提供证据对测...  相似文献   

3.
由2007年开始,香港中学会考中国语文科及英国语文科采用了水平参照模式(standards-referenced reporting)对考生的成绩进行等级评定。在分数处理过程中,采用了含结构参数的Rasch模型。本文介绍了该模型及其一些主要性质,导出了联合极大似然估计(Joint Maximum Likelihood Estimation)的求解方程,并报告了应用该模型于香港中学会考水平参照等级评定中的主要结果。  相似文献   

4.
基于项目反应理论(IRT)的视角对某中学八年级数学期末考试的学业成绩应用S-P表和Rasch模型分析软件WINSTEPS进行比较分析,其结果显示二者既有同一性也有差异性。整体情况分析时均显示试题总体质量较好,能为多数学生提供准确成绩分析,但Rasch模型分析结果在学生能力水平估计、极端数据分析上相比S-P表更加准确。个别试题存在群组间的测量偏差,需要改进。为实现优化教学的目的,教师需综合使用两种分析工具。  相似文献   

5.
Rasch模型具有被试参数和项目参数相互独立的性质,即被试能力与项目难度无关。本研究以某年度大学入学考试数学学科的实测成绩数据为例,在随机抽样、不同性别抽样、不同水平群体抽样等条件下,对Rasch模型项目参数不变性进行了验证研究。研究表明:Rasch模型项目参数不变性验证的前提条件较为严格,需要排除诸多干扰因素的影响;Rasch模型项目参数不变性的验证存在一定的误差,无法实现与理论一致的"不变性";Rasch模型项目参数不变性没有统一的标准,需依据实际问题确定。  相似文献   

6.
张洁 《考试研究》2008,(4):65-78
口语考试作为一种相对真实(authentic)和直接(direct)的测试手段,已被越来越广泛地应用于语言测试实践中。然而,在测试过程中引入的主观判断、评分标准和量表的设计与使用等因素,使分数受到更多考生能力以外因素的影响。本研究基于2007年某考点PETS三级口语考试数据,用多侧面Rasch模型(Many-facet Rasch Model,简称MFRM)对这次考试的评分进行了事后质量控制研究。MFRM将语言运用测试多方面因素综合在一个数学模型中,不仅能够把所有侧面在同一标尺下进行衡量,还能对单独侧面,甚至每个个体进行具体分析,有针对性地找到潜在的"问题评分员"和可能被误判的考生,是主观评分环节有效的质量监控手段。  相似文献   

7.
以在Rasch基础上拓展的多维随机系数多项式Logit模型(MRCMLM)为基础,对某高考数学试卷可能存在的三种能力维度模型进行验证性因素分析,最终确定了一种最佳的维度模型,并在该模型框架下进行多维试题分析。  相似文献   

8.
与传统测量模型相比,Rasch模型因其客观和等距的特点在试卷质量分析中独具优势。本文以南京市小学科学六年级技术与工程素养评测试卷的质量分析为例,从试卷整体质量检验、单维性检验、试卷难度与学生能力的匹配性检验、各题质量分析、题目拟合度和测量误差检验等方面介绍了Rasch模型在试卷质量分析中的应用,同时指出该评测试卷的信效度较高、题目区分度合理,绝大多数题目达到了测量预期。在具体应用中,测量者应依据实际情况选择合适的Rasch分析软件及Rasch模型对应的分析功能;在Rasch模型检测出试卷中的问题项目后,测量者应依据实际情况解释和处理这些问题项目。  相似文献   

9.
Rasch模型应用在试卷质量分析中有如下方法:怀特图(Wright Map)——让读者对试卷的整体情况有一个大致地了解;多维性检验(Multidimensionality Investigations)——考查试卷是否测量被试的同一潜在特质(即阅读能力);项目拟合和误差统计(ITEM:fit order)、气泡图(Bubble Diagram),等等。文章以广西壮族自治区五六年级学生阅读素养前测试卷的质量分析为例,呈现了Rasch模型测评的过程。测评表明,该试题总体上是一套高质量的试卷,试题项目覆盖了所有能力水平的被试,难度编制合理,绝大多数题目达到了预期的测验效果。然而,由于测量目标的不同,Rasch模型功能和指标的选择以及结果的解释都存在相当大的差异,研究者需要基于测量目标进行选择,根据实际情况灵活处理。  相似文献   

10.
本研究利用多面rasch模型(MFRM)评估大学生"多元统计方法分析"课程的能力水平,并分析题目的难度和评分者的严苛度。研究结果显示,多面Rasch分析可以很好地解决开放式考试中对于学科能力的评估,其评估结果与学生的反馈一致。  相似文献   

11.
A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch‐based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the administered items and target item locations determined from provisional ability estimates at the start of each item. Several new indices based on this framework are introduced and compared to previously suggested measures of adaptation using simulated and real test data. Results from the simulation indicate that some previously suggested indices are not as sensitive to changes in item pool size and the use of constraints as the new indices and may not work as well under different item selection rules. The simulation study and real data example also illustrate the utility of using the new indices to measure adaptation at both a group and individual level. Discussion is provided on how one may use several of the indices to measure adaptation of Rasch‐based CATs in practice.  相似文献   

12.
A computerized adaptive testing (CAT) algorithm that has the potential to increase the homogeneity of CAT's item-exposure rates without significantly sacrificing the precision of ability estimates was proposed and assessed in the shadow-test ( van der Linden & Reese, 1998 ) CAT context. This CAT algorithm was formed by a combination of maximizing or minimizing varied target functions while assembling shadow tests. There were four target functions to be separately used in the first, second, third, and fourth quarter test of CAT. The elements to be used in the four functions were associated with (a) a random number assigned to each item, (b) the absolute difference between an examinee's current ability estimate and an item difficulty, (c) the absolute difference between an examinee's current ability estimate and an optimum item difficulty, and (d) item information. The results indicated that this combined CAT fully utilized all the items in the pool, reduced the maximum exposure rates, and achieved more homogeneous exposure rates. Moreover, its precision in recovering ability estimates was similar to that of the maximum item-information method. The combined CAT method resulted in the best overall results compared with the other individual CAT item-selection methods. The findings from the combined CAT are encouraging. Future uses are discussed.  相似文献   

13.
The development of reliable tools for assessing digital competences is of great importance. This is why we set out a re-evaluation of the measurement quality of the D21-Digital-Index assessment instrument. The D21-Digital-Index is a biannual and influential study held in Germany. The instrument used in the D21-surveys is based on the theoretical framework DigComp. In our analyses we used data of 1142 participants from vocational training and higher education institutions to estimate item parameters and the quality of the instrument using Item Response Theory. Because choosing an appropriate IRT-model is crucial for instrument evaluation, we calculated and compared two types of models, the Rasch and the Birnbaum model of which the latter turned out to achieve the better fit. In a unidimensional analysis the five scales of the instrument with 24 items in total yield acceptable measures. Multidimensional analysis shows a dimensional separation and hence confirms the construct validity of the instrument.  相似文献   

14.
本研究应用项目反应理论,从被试的阅读能力值和题目的难度值这两个方面,分析阅读理解测试中多项选择题命题者对考试效度的影响。实验设计中,将两组被试同时施测于一项“阅读水平测试”,根据测试结果估计出的两组被试能力值之间无显著性差异。再次将这两组被试分别施测于两位不同命题者所命制的题目,尽管这些题目均产生于相同的阅读材料,且题目的难度值之间并没有显著性差异,被试的表现却显著不同。Rasch模型认为,被试表现由被试能力和试题难度共同决定。因此,可以推测,这是由于不同命题者所命制的题目影响了被试的表现,并进而影响了使用多项选择题进行阅读理解测试的效度。  相似文献   

15.
It is known that the Rasch model is a special two-level hierarchical generalized linear model (HGLM). This article demonstrates that the many-faceted Rasch model (MFRM) is also a special case of the two-level HGLM, with a random intercept representing examinee ability on a test, and fixed effects for the test items, judges, and possibly other facets. This perspective suggests useful modeling extensions of the MFRM. For example, in the HGLM framework it is possible to model random effects for items and judges in order to assess their stability across examinees. The MFRM can also be extended so that item difficulty and judge severity are modeled as functions of examinee characteristics (covariates), for the purposes of detecting differential item functioning and differential rater functioning. Practical illustrations of the HGLM are presented through the analysis of simulated and real judge-mediated data sets involving ordinal responses.  相似文献   

16.
Local equating (LE) is based on Lord's criterion of equity. It defines a family of true transformations that aim at the ideal of equitable equating. van der Linden (this issue) offers a detailed discussion of common issues in observed‐score equating relative to this local approach. By assuming an underlying item response theory model, one of the main features of LE is that it adjusts the equated raw scores using conditional distributions of raw scores given an estimate of the ability of interest. In this article, we argue that this feature disappears when using a Rasch model for the estimation of the true transformation, while the one‐parameter logistic model and the two‐parameter logistic model do provide a local adjustment of the equated score.  相似文献   

17.
The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet‐based assessment, both local item dependence and local person dependence are likely to be induced. This study proposed a four‐level IRT model to simultaneously account for dual local dependence due to item clustering and person clustering. Model parameter estimation was explored using the Markov Chain Monte Carlo method. Model parameter recovery was evaluated in a simulation study in comparison with three other related models: the Rasch model, the Rasch testlet model, and the three‐level Rasch model for person clustering. In general, the proposed model recovered the item difficulty and person ability parameters with the least total error. The bias in both item and person parameter estimation was not affected but the standard error (SE) was affected. In some simulation conditions, the difference in classification accuracy between models could go up to 11%. The illustration using the real data generally supported model performance observed in the simulation study.  相似文献   

18.
Although it has been claimed that the Rasch model leads to a higher degree of objectivity in measurement than has been previously possible, this model has had little impact on test development. Population-invariant item and ability calibrations, together with the statistical equivalency of any two item subsets, are supposedly possible if the item pool has been calibrated by the Rasch model. Initial research has been encouraging, but the implications of underlying assumptions and operational computations in the Rasch model for trait theory have not been clear from previous work. The current paper presents an analysis of the conditions under which the claims of objectivity will be substantiated, with special emphasis on the nature of equivalent forms. It is concluded that the real advantages of the Rasch model will not be apparent until the technology of trait measurement becomes more sophisticated.  相似文献   

19.
Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT.  相似文献   

20.
应用Rasch模型计算出来的题目难度值与被试样本无关,是题目的一项最重要的量化指标.Rasch模型的题目难度的计算在EXCEL程序中能很方便地完成,本文介绍了详细的计算步骤,并讨论了怎样用题目难度值来估算考生的能力水平.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号