首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
本研究对单组设计中平均数等值、线性等值和等百分位等值三种等值方法的群体不变性进行了探讨。研究数据来自中国汉语水平考试,考生按性别被分为不同的子群体。研究结果表明,平均数等值和线性等值两种方法在子群体和总体考生中的等值转换关系很接近,具有较好的群体不变性;而等百分位等值法在子群体和总体考生中的等值结果差异较大,群体不变性较差。  相似文献   

2.
等值对考试具有重要意义,而我国的大部分考试却没有实现等值,在少数经过等值的考试中,大多只限于对二级记分题目的等值,鲜有对多级记分题目的等值研究。该研究针对包含多级记分题目的国内某大型语言类考试,探讨了等级反应模型下的同时校准法、固定共同题参数法以及链接独立校准法中的平均数标准差方法、平均数平均数方法、Haebara法和Stocking-Lord法六种等值方法的效果,从而优选最适合该考试的等值方法。  相似文献   

3.
基于项目反应理论中的LOGISTIC双参数模型研究共同题非等组设计下,考生能力分布与被试量对等值的影响。等值方法采用分别校准下的项目特征曲线法、Stocking-Lord法、Haebara法。等值结果采用等值分数标准误、等值系数标准误、共同题参数稳定性三种方法进行评价。研究结果表明,考生能力分布越接近,被试量越大,等值误差越小;且Stocking-Lord法较Haebara法的等值结果更稳定。  相似文献   

4.
本文通过对测验等值方法和等值设计进行研究,得到不等信度下的线性等值公式,同时给出等值转换新公式预报值的置信区间估计。  相似文献   

5.
本文采用共同题非等组设计,对五种基于IRT的项目参数等值方法进行比较研究。研究数据来自湖北某试点地区课改实验区和非课改区考生在标准化中考数学科目的考试数据,兼用大样本标准和其他标准作为各等值方法比较的检验标准,以RMSD指标作为操作性检验标准,利用STUIRT程序进行等值分析。研究结果表明,针对本研究所设置的等值情境,MS方法稳健性最差,对于项目难度参数的等值,同时校准方法最好,其次是SL特征曲线法,对于项目区分度参数的等值,MM方法精确性最好。  相似文献   

6.
本研究基于IRT理论中最常用的LOGISTIC三种模型来探讨等值的跨样本一致性,研究对象为某一汉语类别的测验,等值方法采用同时校准法。研究结果表明,双参数模型下同时校准法等值跨样本一致性最好,最为稳定。  相似文献   

7.
随着新一轮高考改革的深入,考生在一些科目中将有两次考试机会。这两次考试分数间的相互转换可以通过测验等值来解决。然而测验等值实践涉及诸多环节,每个环节都对最终的等值效果有重要的影响。本文从等值设计的选择、等值必要性判断、等值方法的选择、评价标准的选择以及等值过程的质量控制等方面说明在高考改革中测验等值应注意的问题,以期显著提高等值质量。  相似文献   

8.
等值作为翻译标准和理想,相对于传统的翻译标准更加准确、具体,它有助于使信息量化,更好地处理有关结构、意义、文化、认知和语用等问题。回顾了20世纪80年代以来国内翻译理论家们对翻译等值论的研究,对与其相关的有争议的热点问题进行了梳理分析,如等值的界定、等值的可行性和相对性等等,同时在此基础上分析了对中国翻译研究的启示以及有关翻译等值研究的未来走向。  相似文献   

9.
论文化的翻译等值层次   总被引:2,自引:0,他引:2  
通过对文化的可译度,即可译文化、部分可译文化和不可译文化的研究表明,由于不同国家在语言习惯和文化等方面的不等值,因此翻译等值只能是近似的等值。文化的翻译等值层次,首先应该达到文化语义层次的翻译等值.然后才是文化语言形式层次的等值。  相似文献   

10.
为降低学生学业负担,避免学生因偶然因素导致的考试误差,新一轮高考改革要求为考生提供两次外语及学业水平考试机会。在此背景下,如何比较两次考试成绩成为关键。测验等值技术作为心理测量学的重要组成部分,恰能有效解决测验分数比较的问题。通过对等值概念、等值设计、等值处理方法及等值评估等问题的探讨,分析了高考等值应注意的问题及其可能采取的等值方法,为实现高考成绩比较科学化提供技术支持。  相似文献   

11.
曹文娟  白俊梅 《考试研究》2013,(3):79-85,33
本文使用R-2.15.2软件模拟研究锚测验难度参数方差特征对测验等值误差的影响,采用三种等值方法(链百分位等值法、Levine等值法和Tucker等值法)对锚测验不同类型的难度方差进行比较研究。结果显示,当锚测验难度方差小于全测验难度方差时,其等值的随机误差和系统误差与锚测验难度方差和全测验难度方差一致时(即锚测验为全测验的平行缩减版minitest时)的表现基本相同。因此,对锚测验而言,要求其与全测验具有相同的统计规格可能过于严格。  相似文献   

12.
测验等值设计新探讨:ETP设计   总被引:1,自引:1,他引:0  
项目反应理论框架下新的基于题库的大型测验的等值设计:等值到题库设计(ETP设计),与其他传统等值设计相比,可以避免传统共同组设计和共同题设计的一些缺点,并能够在保证等值精度的情况下对测验进行等值。在目前许多大型考试已有题库的情况下,ETP设计具有较大的发展空间。  相似文献   

13.
关于汉语水平考试等值设计的新思考   总被引:2,自引:0,他引:2  
汉语水平考试(HSK)实施多年来,一直坚持等值。在实际等值过程中,HSK遇到了一些新情况,旧的等值设计暴露出一些局限,变得难以适应。本文有针对性地提出了预测等值和跨国等值等新设计,以期应对新问题。  相似文献   

14.
In setting the cut-scores on National Curriculum tests it is important to maintain standards. In the process of test development, both within and across years, changes are made to the style of the questions in order to increase their ‘accessibility’. This raises the question of whether a more accessible test should have higher cut-scores. Purely statistical definitions of equating are blind to differences between ‘accessibility’ and ‘easiness’ and cut-scores derived from statistical equating methods will be higher for a more accessible test. Arguments about the increased validity of the more accessible test are sometimes used to justify not raising the cut-scores as much as would be indicated by statistical methods. These arguments are shown to be equivalent to postulating that changing the accessibility is changing the construct measured by the test. Using a statistical measurement model can provide a rational basis for understanding accessibility and identifying types of question where accessibility issues are causing a measurement problem.  相似文献   

15.
Because parameter estimates from different calibration runs under the IRT model are linearly related, a linear equation can convert IRT parameter estimates onto another scale metric without changing the probability of a correct response (Kolen & Brennan, 1995, 2004). This study was designed to explore a new approach to finding a linear equation by fixing C-parameters for anchor items in IRT equating. A rationale for fixing C-parameters for anchor items in IRT equating can be established from the fact that the C-parameters are not affected by any linear transformation. This new approach can avoid the difficulty in getting accurate C-parameters for anchor items embedded in the application of the IRT model. Based upon our findings in this study, we would recommend using the new approach to fix C-parameters for anchor items in IRT equating. This work was supported by a Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research  相似文献   

16.
对新汉语水平考试(HSK)而言,"铆题"的等值方法不可行,单组设计的"铆人"等值方法也缺乏可操作性。面对等值的实际需求,新HSK选择了"平均分等值法"进行等值。本文是为HSK(六级)设计的平均分等值法实施方案,其流程同样适用于新HSK其它等级的考试。  相似文献   

17.
18.
近年来关于DINA模型的相关研究显示,样本量、先验分布、经验贝叶斯或完全贝叶斯估计方法、样本的代表性、项目功能差异和Q阵误指等,均可能是导致DINA项目参数估计发生偏差的原因。使用Monte Carlo模拟试验,对DINA项目参数(猜测参数和失误参数)的组合变化类型和偏差量进行考察,通过条件极大似然估计法估计知识状态,发现项目参数估计值与真值偏差不大时,对知识状态估计的精度影响不大;但是项目参数偏离真值较大时,尤其是在三种组合类型上,对属性掌握存在明显的高估或低估现象。研究结果对于诊断测验等值有一定的启示:若两个测验上锚题的项目参数出现了较大的偏差(0.1),则需要考虑等值的必要性。  相似文献   

19.
《教育实用测度》2013,26(4):383-407
The performance of the item response theory (IRT) true-score equating method is examined under conditions of test multidimensionality. It is argued that a primary concern in applying unidimensional equating methods when multidimensionality is present is the potential decrease in equity (Lord, 1980) attributable to the fact that examinees of different ability are expected to obtain the same test scores. In contrast to equating studies based on real test data, the use of simulation in equating research not only permits assessment of these effects but also enables investigation of hypothetical equating conditions in which multidimensionality can be suspected to be especially problematic for test equating. In this article, I investigate whether the IRT true-score equating method, which explicitly assumes the item response matrix is unidimensional, is more adversely affected by the presence of multidimensionality than 2 conventional equating methods-linear and equipercentile equating-using several recently proposed equity-based criteria (Thomasson, 1993). Results from 2 simulation studies suggest that the IRT method performs at least as well as the conventional methods when the correlation between dimensions is high (³ 0.7) and may be only slightly inferior to the equipercentile method when the correlation is moderate to low (£ 0.5).  相似文献   

20.
Accurate equating results are essential when comparing examinee scores across exam forms. Previous research indicates that equating results may not be accurate when group differences are large. This study compared the equating results of frequency estimation, chained equipercentile, item response theory (IRT) true‐score, and IRT observed‐score equating methods. Using mixed‐format test data, equating results were evaluated for group differences ranging from 0 to .75 standard deviations. As group differences increased, equating results became increasingly biased and dissimilar across equating methods. Results suggest that the size of group differences, the likelihood that equating assumptions are violated, and the equating error associated with an equating method should be taken into consideration when choosing an equating method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号