期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

罗志卿《考试研究》2021,(2):66-72

高中信息技术学业水平合格性考试作为标准参照性考试,命题过程需要按照考试目标及要求做好难度控制,通过准确预估试题难度控制试卷难度,实现考试结果与考试目标的一致。命题难度控制技术包括试题的难度预估、试卷难度的控制。通过确定影响难度的主要客观因素、设计简便易行的试题难度计算方法、建立试题难度预估的参照模型等三个环节探究试题难度预估的方法,结合实例进一步探究试卷难度的控制技术。相似文献

2.

大规模教育考试试题难度模糊综合评判模型研究

王晓华《教育科学研究》2014,(8)

我国大规模教育考试的难度受到社会的广泛关注,非常敏感,控制不好容易引发社会问题。大规模教育考试试题难度影响因素众多且复杂,如何在命题中将试题难度控制在合适的范围内,也是一个严肃的科学问题。在分析现有试题难度预估方法的基础上,结合当前大规模教育考试命题工作的实际,从理论上提出应用模糊数学原理和方法,建立大规模教育考试试题难度模糊综合评判模型,采用定性与定量相结合的方式预估试题的难度。首先建立试题难度模糊综合评判模型,然后说明如何使用模型进行模糊综合评判,以及如何确定试题难度影响因素的权重系数,最后基于试题难度的预估,说明试卷难度的评判方法。试题难度模糊综合评判方法实用性强、具有处理时变和非线性的能力、使用方便,在大规模教育考试中具有广阔的应用前景。相似文献

3.

预估难度一种自学考试的试题难度确定方法 总被引：2，自引：0，他引：2

柳博《中国考试》2007,(7):29-34

自学考试作为标准参照性考试,试题难度与考试及格标准密切相关。在当前情况下,如何在命题时合理地评定试题的难度,以达到恰当地控制考试及格标准的目标,是考试工作者关注的核心问题之一。本文提出的“预估难度”的概念及其评定方法,将试题内容与标准常模有机结合,使“评定试题难度”与“控制考试及格标准”两项工作“毕其功于一役”,提供了一种在命题时确定试题难度的思路。相似文献

4.

关于会考试题难度的预估

天津市课题研究组《历史教学(高校版)》1996,(6)

试题难度是衡量试卷质量的一项重要指标,是科学化命题的重要组成部分.做好试题难度预估对于试卷质量指标的实现,进而完成考试目标具有十分重要的意义. 相似文献

5.

论高等教育自学考试试题难度的控制 总被引：1，自引：0，他引：1

王晓华《中国考试》2008,(1):48-54

高等教育自学考试试题难度的控制是命题工作中的主要内容和核心问题之一。本文论述了自学考试试题难度控制的意义、要求、基本措施以及试题难度调控的基本方法。特别指出自学考试大纲中对试题难度层次分值比例规定的不尽合理之处.并提出了相应的调整建议。分析了不同命题质量控制措施对试题难度控制的不同作用。最后,结合实例论述了试题难度调控的基本技术方法。相似文献

6.

试题难度的主观预估方法

《中国考试》2014,(2)

试题难度一般通过实际测试考生而获得,但这种预试方法的实施具有一定局限性。难度的主观预估方法无需依赖考生,主要由学科专家根据经验对试题难度进行预测,因此在中、高考等考试实践中受到广泛应用。在研究和实践中,研究者们不断完善主观预估法,并提出不同的估计方法。本文对传统的主观评判法与配对比较的难度估计法进行介绍,以期更系统地认识难度的主观预估方法,促进主观预估法在考试实践中的应用。相似文献

7.

自学考试预估难度评定操作规范研究

程力柳博《成人教育》2012,32(12):15-17

预估难度的评定直接关系到组配试卷的难度,进而影响到考试的合格标准。通过组建评估队伍、培训评估专家、分析难度因素、估计试题难度、调整难度值、构建难度量表等步骤,建立自学考试预估难度的评定操作规范。为了控制评定误差,需要采取有针对性的措施:统一评估专家的合格标准;明确试题统计难度和预估难度的内涵;确定试题难度的影响因素。相似文献

8.

高等教育自学考试试题难度研究

崔志勇周鹏涛《黑龙江高教研究》2012,30(5):51-54

试题难度指的是试题的内容难度,内容难度是试题自身固有的一种属性,是由试题的内在因素决定的,包括试题的知识广度、知识深度、考核的认知层次、题型、试题复杂程度、试题内容新颖性等,不依赖于考生样本的得分率,相对较少考虑学生的答题反应,主要从试题内容方面来评定的难度.基于内容难度的理念,形成了预测难度的理论模型及估计方法.所谓预测难度,就是在命题时由命题教师根据试题内容,综合考虑各种影响难度的因素进行评估而得出的试题难度.研究总体上沿两条线展开:第一条是对命题教师对影响试题难度因素看法的实际调查,统计出影响试题难度的主要因素及各因素所占的权重,然后按照柳博提出的预估难度模型-P=k∑i=1MiNi+c,计算试题的预估难度;第二条是对试题实测难度与预估难度数据进行统计分析. 相似文献

9.

学业水平考试物理试题难度预估方法探究 总被引：1，自引：1，他引：0

郭长江牟亚萍《考试研究》2013,(6):44-53

目前上海市普通高中学业水平考试未实行考前试测制度,因此试题难易度主要依据试题编制者的经验进行预估,尚无量化研究的方法。本研究根据国内外研究经验,从试题的物理概念、试题设计、数学运算三个项目出发,结合2011年上海市普通高中物理学业水平考试试题难度实测数据分析,构建试题难度预估的量化方法,并用2012年上海市普通高中物理学业水平考试试题难度实测数据检验其准确性,期望为今后物理试题难易度预估提供研究的基础。相似文献

10.

基于支持向量机的阅读理解试题难度预估研究

吴生蕾任杰《考试研究》2022,(5):68-77

试题难度反映试题质量,保证试题质量是保障考试信度和社会公平的关键。阅读理解试题是语言测试的考查重点,对阅读理解试题进行难度预估具有重要意义。支持向量机方法既可应用于线性可分数据,又可应用于非线性可分数据,本文采用支持向量机方法,以HSK（初、中等）阅读理解的第二部分试题为研究样本,对试题难度进行类别与数值的预估,分别以分类准确率、均方误差为评价指标。研究表明,支持向量机可用于阅读理解试题难度类别的预估。相似文献

11.

Nonparametric Person-Fit Research: Some Theoretical issues an Empirical Example

《教育实用测度》2013,26(1):77-89

In person-fit analysis, it is investigated whether an item score pattern is improbable given the item score patterns of the other persons in the group or given an expected score pattern on the basis of a test model. In this study, several existing group-based statistics are discussed to detect such improbable item score patterns, along with the cut scores that were proposed in the literature to classify an item score pattern as aberrant. By means of a simulation study and an empirical study, the detection rate of these statistics is compared, and the practical use of various cut scores is investigated. It is furthermore demonstrated that person-fit statistics can be used to detect persons with a deficiency of knowledge on an achievement test. 相似文献

12.

The Russell Sage Social Relations Test

Dora E. Damrin 《Journal of Experimental Education》2013,81(1):85-99

Using a technique that controlled exposure of items, the investigator examined the effect on mean test score, item difficulty index, and reliability and validity coefficients of the reordering of items within a power test containing ten letter-series-completion items. The results suggest that effects on test statistics from item rearrangement are, generally, minimal. The implication of these findings for test designs involving an item sampling procedure is that performance on an item is minimally influenced by the context in which it occurs. 相似文献

13.

Generating Dichotomous Item Scores with the Four-Parameter Beta Compound Binomial Model

Patrick O. Monahan Won-Chan Lee Robert D. Ankenmann 《Journal of Educational Measurement》2007,44(3):211-225

A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta compound-binomial (4PBCB) strong true score model (with two-term approximation to the compound binomial) is used to estimate and generate the true score distribution. The nonparametric item-true score step functions are estimated by classical item difficulties conditional on proportion-correct total score. The technique performed very well in replicating inter-item correlations, item statistics (point-biserial correlation coefficients and item proportion-correct difficulties), first four moments of total score distribution, and coefficient alpha of three real data sets consisting of educational achievement test scores. The technique replicated real data (including subsamples of differing proficiency) as well as the three-parameter logistic (3PL) IRT model (and much better than the 1PL model) and is therefore a promising alternative simulation technique. This 4PBCB technique may be particularly useful as a more neutral simulation procedure for comparing methods that use different IRT models. 相似文献

14.

IRT Estimation of Domain Scores

R. Darrell Bock David Thissen Michele F. Zimowski 《Journal of Educational Measurement》1997,34(3):197-211

In classical test theory, a test is regarded as a sample of items from a domain defined by generating rules or by content, process, and format specifications, l f the items are a random sample of the domain, then the percent-correct score on the test estimates the domain score, that is, the expected percent correct for all items in the domain. When the domain is represented by a large set of calibrated items, as in item banking applications, item response theory (IRT) provides an alternative estimator of the domain score by transformation of the IRT scale score on the test. This estimator has the advantage of not requiring the test items to be a random sample of the domain, and of having a simple standard error. We present here resampling results in real data demonstrating for uni- and multidimensional models that the IRT estimator is also a more accurate predictor of the domain score than is the classical percent-correct score. These results have implications for reporting outcomes of educational qualification testing and assessment. 相似文献

15.

The relationship of reading attitudes to academic aptitude,locus of control,and field independence

John Blaha Larry Chomin 《Psychology in the schools》1982,19(1):28-32

The relationship between eight dimensions of reading attitude and measures of academic aptitude, locus of control, and field independence was investigated for sample of 322 inner-city Detroit fifth graders. Verbal academic aptitude correlated significantly with the Expressed Reading Difficulty, Reading Anxiety, Silent vs. Oral Reading, and Reading as Enjoyment reading attitude dimensions, while nonverbal academic aptitude correlated with Expressed Reading Difficulty and Reading Anxiety. The Expressed Reading Difficulty, Reading Anxiety, Reading Group, Reading as Direct Reinforcement, and Reading as Enjoyment dimensions were significantly related to the I+ score; reading attitudes were not related to the I- score. Only the Expressed Reading Difficulty dimension correlated with field independence. The meaning of these relationships was discussed. 相似文献

16.

A General Approach to Measuring Test-Taking Effort on Computer-Based Tests

Steven L. Wise Lingyun Gao 《教育实用测度》2017,30(4):343-354

There has been an increased interest in the impact of unmotivated test taking on test performance and score validity. This has led to the development of new ways of measuring test-taking effort based on item response time. In particular, Response Time Effort (RTE) has been shown to provide an assessment of effort down to the level of individual item responses. A limitation of RTE, however, is that it is intended for use with selected response items that must be answered before a test taker can move on to the next item. The current study outlines a general process for measuring item-level effort that can be applied to an expanded set of item types and test-taking behaviors (such as omitted or constructed responses). This process, which is illustrated with data from a large-scale assessment program, should improve our ability to detect non-effortful test taking and perform individual score validation. 相似文献

17.

基于Spss17的护理专业预防医学试卷分析

黄松林岳青《教育与教学研究》2012,(6):83-85,92

通过对某校本科护理专业预防医学课程考查的试卷分析,为改革教学评价方法,提高教学质量提供依据。运用Spss17.0软件包对试卷的难度、信度、效度和成绩进行统计分析。学生考查成绩近似正态分布,平均分(71.7±7.6),信度0.626,效度0.478,难度0.718。该次考试可信度较好,试卷总体难度适中,成绩分布合理,较好地反映了学生的真实水平。相似文献

18.

Information Functions of Rank-2PL Models for Forced-Choice Questionnaires

Jianbin Fu Xuan Tan Patrick C. Kyllonen 《Journal of Educational Measurement》2024,61(1):125-149

This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's information and directional information are described, and the test information for Maximum Likelihood (ML), Maximum A Posterior (MAP), and Expected A Posterior (EAP) trait score estimates is distinguished. Expected item/test information indexes at various levels are proposed and plotted to provide diagnostic information on items and tests. The expected test information indexes for EAP scores may be difficult to compute due to a typical test's vast number of item response patterns. The relationships of item/test information with discrimination parameters of statements, standard error, and reliability estimates of trait score estimates are discussed and demonstrated using real data. Practical suggestions for checking the various expected item/test information indexes and plots are provided. 相似文献

19.

On the Use of IRT Models With Judgmental Standard Setting Procedures

Michael T. Kane 《Journal of Educational Measurement》1987,24(4):333-345

In judgmental standard setting procedures (e.g., the Angoff procedure), expert raters establish minimum pass levels (MPLs) for test items, and these MPLs are then combined to generate a passing score for the test. As suggested by Van der Linden (1982), item response theory (IRT) models may be useful in analyzing the results of judgmental standard setting studies. This paper examines three issues relevant to the use of lRT models in analyzing the results of such studies. First, a statistic for examining the fit of MPLs, based on judges' ratings, to an IRT model is suggested. Second, three methods for setting the passing score on a test based on item MPLs are analyzed; these analyses, based on theoretical models rather than empirical comparisons among the three methods, suggest that the traditional approach (i.e., setting the passing score on the test equal to the sum of the item MPLs) does not provide the best results. Third, a simple procedure, based on generalizability theory, for examining the sources of error in estimates of the passing score is discussed. 相似文献

20.

从纸笔测试到计算机化语言测试的发展

田文燕《湖北广播电视大学学报》2007,27(2):151-152

本文从测试题型、试题库建设、测试方式到得出成绩报告单几个方面,综述了从纸笔测试到计算机化语言测试的发展变化。计算机化语言测试所面临的挑战以及它会利于英语学习者的信息和便捷。相似文献