首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
预估难度的理论模型及应用探析   总被引:7,自引:1,他引:6  
难度决定考试分数的分布,直接影响考试的评价与选拔功能,受到较高的关注。难度一般通过考后对考试数据的统计分析得到,这时合格分数线巴经确定,无法再进行调整。预估难度是在命题阶段由命题专家结合试题内容,通过构建标准常模,进行合理地评定而得到的试题难度。在当前应用愿始分报告考试成绩的情况下,预估难度对确保考试稳定与公平尤其重要。预估难度不同于实测难度,可以进行控制与调整。命题工作不仅是编制试题的过程,而且是预估难度的设计过程。  相似文献   

2.
试题难度指的是试题的内容难度,内容难度是试题自身固有的一种属性,是由试题的内在因素决定的,包括试题的知识广度、知识深度、考核的认知层次、题型、试题复杂程度、试题内容新颖性等,不依赖于考生样本的得分率,相对较少考虑学生的答题反应,主要从试题内容方面来评定的难度.基于内容难度的理念,形成了预测难度的理论模型及估计方法.所谓预测难度,就是在命题时由命题教师根据试题内容,综合考虑各种影响难度的因素进行评估而得出的试题难度.研究总体上沿两条线展开:第一条是对命题教师对影响试题难度因素看法的实际调查,统计出影响试题难度的主要因素及各因素所占的权重,然后按照柳博提出的预估难度模型-P=k∑i=1MiNi+c,计算试题的预估难度;第二条是对试题实测难度与预估难度数据进行统计分析.  相似文献   

3.
高中信息技术学业水平合格性考试作为标准参照性考试,命题过程需要按照考试目标及要求做好难度控制,通过准确预估试题难度控制试卷难度,实现考试结果与考试目标的一致。命题难度控制技术包括试题的难度预估、试卷难度的控制。通过确定影响难度的主要客观因素、设计简便易行的试题难度计算方法、建立试题难度预估的参照模型等三个环节探究试题难度预估的方法,结合实例进一步探究试卷难度的控制技术。  相似文献   

4.
预估难度 一种自学考试的试题难度确定方法   总被引:2,自引:0,他引:2  
柳博 《中国考试》2007,(7):29-34
自学考试作为标准参照性考试,试题难度与考试及格标准密切相关。在当前情况下,如何在命题时合理地评定试题的难度,以达到恰当地控制考试及格标准的目标,是考试工作者关注的核心问题之一。本文提出的“预估难度”的概念及其评定方法,将试题内容与标准常模有机结合,使“评定试题难度”与“控制考试及格标准”两项工作“毕其功于一役”,提供了一种在命题时确定试题难度的思路。  相似文献   

5.
我国大规模教育考试的难度受到社会的广泛关注,非常敏感,控制不好容易引发社会问题。大规模教育考试试题难度影响因素众多且复杂,如何在命题中将试题难度控制在合适的范围内,也是一个严肃的科学问题。在分析现有试题难度预估方法的基础上,结合当前大规模教育考试命题工作的实际,从理论上提出应用模糊数学原理和方法,建立大规模教育考试试题难度模糊综合评判模型,采用定性与定量相结合的方式预估试题的难度。首先建立试题难度模糊综合评判模型,然后说明如何使用模型进行模糊综合评判,以及如何确定试题难度影响因素的权重系数,最后基于试题难度的预估,说明试卷难度的评判方法。试题难度模糊综合评判方法实用性强、具有处理时变和非线性的能力、使用方便,在大规模教育考试中具有广阔的应用前景。  相似文献   

6.
本文从高中会考的性质、目的出发,分析了会考对物理试题难度的要求,以及影响试题难度的因素,提出了会考物理试题模式和运用CTT理论预估试题难度的方法。  相似文献   

7.
叙述了2014年全国高考福建省理综生物卷第28题,实测难度远高于预设难度,主要围绕试题情境和设问的新颖度、繁简度、干扰和障碍度分析试题难度,并结合其他可能的影响因素,分析试题难度预估偏差的原因。  相似文献   

8.
试题难度一般通过实际测试考生而获得,但这种预试方法的实施具有一定局限性。难度的主观预估方法无需依赖考生,主要由学科专家根据经验对试题难度进行预测,因此在中、高考等考试实践中受到广泛应用。在研究和实践中,研究者们不断完善主观预估法,并提出不同的估计方法。本文对传统的主观评判法与配对比较的难度估计法进行介绍,以期更系统地认识难度的主观预估方法,促进主观预估法在考试实践中的应用。  相似文献   

9.
程力  柳博 《教育科学》2012,28(3):60-62
以实测数据为基础,采用多元线性回归的统计方法,分析自学考试试题难度的影响因素。结果表明,试题难度的多元回归线性模型,基本能解释因变量和自变量的关系,对自学考试预估难度的赋值具有现实的指导意义,也为题库建设中试题难度的调控提供了有效的途径。  相似文献   

10.
在经典测验理论中,试题难度P是以考生得分率来评定试题难易程度的指标。自学考试、高中毕业会考等性质上属于标准参考性考试,其特点是根据各科教学(考试)目标的要求,制定考试标准,通过测验衡量考生是否达到标准。这类考试试题难易度的概念应当主要体现教学目标的要求,要求高的是难题,要求低的是易题。考试试题难易分布的不同,影响到多个方面,它可以影响到考生的得分率、试题的重点和难点、教学目标的要求等。可见,标准参考性考试试题难度的分布应加以重点研究,它对题库建设具有重要的意义。本文提出了题库试题抽取的难度概率模…  相似文献   

11.
Validating performance standards is challenging and complex. Because of the difficulties associated with collecting evidence related to external criteria, validity arguments rely heavily on evidence related to internal criteria—especially evidence that expert judgments are internally consistent. Given its importance, it is somewhat surprising that evidence of this kind has rarely been published in the context of the widely used bookmark standard‐setting procedure. In this article we examined the effect of ordered item booklet difficulty on content experts’ bookmark judgments. If panelists make internally consistent judgments, their resultant cut scores should be unaffected by the difficulty of their respective booklets. This internal consistency was not observed: the results suggest that substantial systematic differences in the resultant cut scores can arise when the difficulty of the ordered item booklets varies. These findings raise questions about the ability of content experts to make the judgments required by the bookmark procedure.  相似文献   

12.
难度不是试题的固有属性,而是考生因素与试题特征之间互动的结果。很多试题分析者倾向于将试题难度偏高的原因仅仅归结于学生未掌握相关知识或技能,而忽视试题本身的特征。通过分析60道难度在0.6以下的高考英语试题,探究其难度来源。结果显示,除考生因素外,难题或偏难题的难度来源也与命题技术有关,比如答案的唯一性与可接受性、考查内容超纲、考点设置与评分标准欠妥等方面的问题。为此,提出考试机构应提高命题水平,加强试题质量监控,确保大规模考试科学选拔人才。  相似文献   

13.
Certain testing authorities have implied that the proportion of examinees who answer an item correctly may be influenced by the difficulty of the immediately preceding item. If present, such a "sequence effect" would cause p (as an estimate of item difficulty level) to misrepresent an item's "true" level of difficulty. To investigate this hypothesis, a balanced Latin square design was used to rearrange examination items into various test forms. A unique analysis of variance procedure was used to analyze the resulting data. The alleged sequence effect was not found. Certain limitations preclude the generalization of this finding to all students or to all testing situations. However, the evidence provided by this investigation does suggest that comments relating to sequence effects should be qualified as compared with presently appearing statements.  相似文献   

14.
The effect of item parameters (discrimination, difficulty, and level of guessing) on the item-fit statistic was investigated using simulated dichotomous data. Nine tests were simulated using 1,000 persons, 50 items, three levels of item discrimination, three levels of item difficulty, and three levels of guessing. The item fit was estimated using two fit statistics: the likelihood ratio statistic (X2B), and the standardized residuals (SRs). All the item parameters were simulated to be normally distributed. Results showed that the levels of item discrimination and guessing affected the item-fit values. As the level of item discrimination or guessing increased, item-fit values increased and more items misfit the model. The level of item difficulty did not affect the item-fit statistic.  相似文献   

15.
This paper presents findings from research exploring gender by item difficulty interaction on mathematics test scores in Cyprus. Data steamed from 2 longitudinal studies with 4 different age groups of primary school students. The hypothesis that boys tended to outperform girls on the hardest items and girls tended to outperform boys on the easiest items was generally supported for each year group. The effect of social class was also examined. For each social class, there was a correlation between the item difficulty differences estimated on girls and boys separately and the difficulty of the item estimated on the whole sample. It is claimed that in understanding gender differences in mathematics, item difficulty should be treated as an independent variable. Suggestions for further studies are provided, and implications for the development of assessment policy in mathematics are drawn.  相似文献   

16.
When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores.  相似文献   

17.
文章采用开放式调查、专家访谈、理论分析等方法建构了大学体育教师教学能力的结构,并通过证实性因子分析证实了大学体育教师教学能力结构包括七个因子,即运用现代技术能力、课堂组织与管理能力、教法运用与知识传授能力、教学实施与调控能力、考评与处理突发事件能力、讲解示范能力、教学设计能力。因素分析结果显示,七因素模型拟合度较好,且具有一定的稳定性。各项目与所属因子及量表总分之间高度相关,具有较好的构想效度和项目信度。  相似文献   

18.
Sixty-eight graduate students made general and specific ratings of the quality of 12 classroom test items, which varied in difficulty and discrimination. Four treatment combinations defined two additional factors: group discussion/no group discussion of test items and exposure/no exposure to an instructional module on test item construction. The students rated the items differentially, depending not only on item difficulty level but also on item discriminative power. The group discussion and exposure to module factors had significant effects on the general item ratings only. Implications of the research were discussed.  相似文献   

19.
学业水平考试物理试题难度预估方法探究   总被引:1,自引:1,他引:0  
目前上海市普通高中学业水平考试未实行考前试测制度,因此试题难易度主要依据试题编制者的经验进行预估,尚无量化研究的方法。本研究根据国内外研究经验,从试题的物理概念、试题设计、数学运算三个项目出发,结合2011年上海市普通高中物理学业水平考试试题难度实测数据分析,构建试题难度预估的量化方法,并用2012年上海市普通高中物理学业水平考试试题难度实测数据检验其准确性,期望为今后物理试题难易度预估提供研究的基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号