语言教学和语言测试之间相互依赖,对语言测试结果的分析能提高教学质量.本文从测试学的角度,认为大学英语测试试卷的分析应从试卷分数的分析和解释、项目分析和试卷整体的信度和效度分析等三个方面进行,同时对试卷分析指标及其应用进行了介绍.  相似文献   

高等学校英语应用能力考试"语法与词汇"内容效度分析   总被引:1,自引:0,他引:1  
近年来,高等学校英语应用能力考试越来越受到人们的重视和关注。本文就测试内容与测试目标的。相关性、代表性和与考生的合适性三个方面,对PRETCO -A级2000年至2003年的八份试卷的“语法结构”部分的120题测试题进行内容效度分析,并试图提出一些建议以提高PRETCO的试卷质量。  相似文献   

陈娟 《地理教育》2015,(Z1):94-95
一、认真做好试卷分析工作一份试卷的分析,要做到科学、准确,有效度、难度、区分度等计算测定,应从试卷及学生等方面进行分析。首先,评学生答题情况。讲评之前应做好相关数据统计,包括全班的平均分、最高分、最低分、及格率、优秀率、各分数段人数、各题得分率,以及其它参考班级对应数据,以确定本班级成绩状况、各人所处位置。对本次测试中进步明显者、明显不足者等,将有关情况分类统计,落实人头,做到有的放矢,才能提高讲评质量。本次一诊模  相似文献   

一般而言,进行一次学生成绩考试需要经过制订测试目标、编制试卷、在适当时机实施测试以及对测试结果进行分析评定等步骤。测试的科学性是由以上四个环节来保证的,其中起关键作用的一环是编制试卷,因为测试目标必须通过试卷来具体化,实施测试、获取关于学生学习质量的信息又需以试卷为工具。而一份试卷是由一道道试题组成的,所以从某种意义上说,测试的科学性取决于命题的科学性。那末,怎样命题才能算是科学性较强了呢?对此,可以从以下几个方面进行分析。 一、试题要有明确的测量目标。 教师编拟一道试题总是有一定意图的,即希望通过学生对某题的解答,能反映出学生达到某一教学目标的状况。在书面测验中,测量目标是由解答试题的知识要求和能力要求两个因素构成的。在考试命题时,不少教师感到知识要求尚易把握,能力要求难以  相似文献   

2005年开始的在全国范围内实行新课程改革到现在已有7年.物理课程标准和相关试卷之间的一致性研究是一个重要的问题.本文研究了以课程标准为依据制订的2012年江苏省学业水平测试说明和2012年江苏省学业水平物理试卷之间的一致性.为了定量计算四个一致性指标,参考布卢姆认知目标分类修订的二维框架,运用一个共同的结构框架来标记测试说明和试卷,使用波特一致性函数计算两者的一致性指标.结果表明测试说明和试卷在统计学意义上是一致的.测试说明和考试之间的一致性程度将对课堂教学产生影响.  相似文献   

本文主要运用教育测量学的原理和方法,对1993年陕西省普通高中毕业会考化学试题试卷进行了分析。主要分析了各类型试题是否符合试题编制原则,试卷是否符合试卷编制原则;试题试卷的测量要求与教学目标、测士目标的吻合庆以及考试结果的目标达成度和考试信反,从而总结经验并对今后我省的会考工作提出科学的建议。  相似文献   

1993年全国高考新科目组语文试卷(以下简称“新高考语文试卷”)较好地体现了“两个有利”的原则。试卷紧扣《中学语文教学大纲》和《考试说明》,题目平稳,不超纲,不偏不怪,注重基础知识,并突出了对语文学科能力的考查,其选优功能强;对中学语文教学抓好基础、培养能力的导向也很明确。试卷所考查的内容、题型以及题目赋分都符合《考试说明》的要求,有利于稳定中学教学。 从考试结果来看,试卷较好地实现了测试目标。考试之后,国家教委考试中心对文史类抽调2484份试卷,理工类抽调3282份试卷进行了统计分析。各试题分数分布合理,区分度、难度系数都比较理想。测试结果较好地反映了命题意图,较好地达到了选  相似文献   

试卷讲评是体现教育测评价值的重要环节之一,高质量的试卷分析是开展优质讲评课最重要的保障。本论文以拉萨市高中数学试卷为例,借助WEKA平台的K-means和Apriori试卷分析方法,结合教师个别访谈、问卷调查、试卷反思等多维度数据的综合分析,最终精准确定试卷讲评内容、讲评知识点之间的关联程度以及学生讲评的分组方案,从而优化讲评课方案的设计。  相似文献   

引述相关现代测试反拨作用理论,对四份英语成就测试卷进行统计分析.通过与理论标准的对照,研究其在试卷内容、试卷内各部分之间的相关、试题区分度等方面是否达标,以提高测试的正面反拨作用,充分发挥其对教学的积极影响.  相似文献   

一、2009年高考命题趋势分析 根据测试结果并经过数据统计分析可以看出,本次试卷的命制比较理想地完成了2008年高考英语学科的考查目标与任务。为使本项工作稳妥发展,日臻完善,我们提出以下建议,供有关学校参考。  相似文献   

To test the hypothesis that the basic “logic” utilized by individuals in scientific hypothesis testing is the biconditional (if and only if), and that the biconditional is a precondition for the development of formal operations, a sample of 387 students in grades eight, ten, twelve, and college were administered eight reasoning items. Five of the items involved the formal operational schemata of probability, proportions and correlations. Two of the items involved propositions and correlations. Two of the items involved propositional logic. One item involved the biconditional. Percentages of correct responses on most of the items increased with age. A principal-component analysis revealed three factors, two of which were identified as involving operational thought, one of which involved propositional logic. As predicted, the biconditional reasoning item loaded on one of the operational thought factors. A Guttman scale analysis of the items failed to reveal a unidimensional scale, yet the biconditional reasoning item ordered first supporting the hypothesis that it is a precondition for formal operational reasoning. Implications for teaching science students how to test hypotheses are discussed.  相似文献   

A thorough search of the literature was conducted to locate empirical studies investigating the trait or construct equivalence of multiple-choice (MC) and constructed-response (CR) items. Of the 67 studies identified, 29 studies included 56 correlations between items in both formats. These 56 correlations were corrected for attenuation and synthesized to establish evidence for a common estimate of correlation (true-score correlations). The 56 disattenuated correlations were highly heterogeneous. A search for moderators to explain this variation uncovered the role of the design characteristics of test items used in the studies. When items are constructed in both formats using the same stem (stem equivalent), the mean correlation between the two formats approaches unity and is significantly higher than when using non-stem-equivalent items (particularly when using essay-type items). Construct equivalence, in part, appears to be a function of the item design method or the item writer's intent.  相似文献   

This study discusses a procedure for testing the equivalence among different item response formats used in personality and attitude measurement. The procedure is based on the assumption that latent response variables underlie the observed item responses (underlying variables approach) and uses a nested series of confirmatory factor analysis models derived from Joreskog's (1971) method for estimating the dissatenuated correlation. The different stages of the procedure are illustrated using real data.  相似文献   

Cross‐level invariance in a multilevel item response model can be investigated by testing whether the within‐level item discriminations are equal to the between‐level item discriminations. Testing the cross‐level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model applications, the cross‐level invariance is assumed without testing of the cross‐level invariance assumption. In this study, the detection methods of differential item discrimination (DID) over levels and the consequences of ignoring DID are illustrated and discussed with the use of multilevel item response models. Simulation results showed that the likelihood ratio test (LRT) performed well in detecting global DID at the test level when some portion of the items exhibited DID. At the item level, the Akaike information criterion (AIC), the sample‐size adjusted Bayesian information criterion (saBIC), LRT, and Wald test showed a satisfactory rejection rate (>.8) when some portion of the items exhibited DID and the items had lower intraclass correlations (or higher DID magnitudes). When DID was ignored, the accuracy of the item discrimination estimates and standard errors was mainly problematic. Implications of the findings and limitations are discussed.  相似文献   

This study compares the Rasch item fit approach for detecting multidimensionality in response data with principal component analysis without rotation using simulated data. The data in this study were simulated to represent varying degrees of multidimensionality and varying proportions of items representing each dimension. Because the requirement of unidimensionality is necessary to preserve the desirable measurement properties of Rasch models, useful ways of testing this requirement must be developed. The results of the analyses indicate that both the principal component approach and the Rasch item fit approach work in a variety of multidimensional data structures. However, each technique is unable to detect multidimensionality in certain combinations of the level of correlation between the two variables and the proportion of items loading on the two factors. In cases where the intention is to create a unidimensional structure, one would expect few items to load on the second factor and the correlation between the factors to be high. The Rasch item fit approach detects dimensionality more accurately in these situations.  相似文献   

A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits examining the latent structure of tentative versions of multiple-component measuring instruments. The discussed procedure is straightforwardly utilized with the increasingly popular latent variable modeling software Mplus, and is illustrated on a numerical example.  相似文献   

Students rated the quality of the items on a classroom test that had been taken previously. On the same test, psychometric item indices were calculated. The results showed that the student ratings were related to the item difficulty, but not to the item-test correlation. In addition, the better-achieving students tended to rate the items as less ambiguous. Finally, the ambiguity ratings were more highly related to the item-test correlations for the better achieving students. These findings support opinions held by many instructors of students' judgments of item quality.  相似文献   

本研究利用建构图设计一套含有六大部分的30道试题。题型包括拼写题、选择题和简答题。共有175名6到14岁儿童参加了此项考试。Rasch分析结果发现题组内局部题目依赖并不严重。信度为0.85。考题的难度和考生能力的配合度相当良好。我们根据建构图来编写考题,因此有一定程度的内容效度。但有9道题的难度稍微与原先预期略有出入。有5道题不大吻合Rasch模式的预期,没有发现在性别上有明显的项目功能差异。考生能力与学习英语的时间有正相关。最后探讨了基于信息通讯技术的远程计算机自适应测验的技术问题。  相似文献   

Using a bidimensional two-parameter logistic model, the authors generated data for two groups on a 40-item test. The item parameters were the same for the two groups, but the correlation between the two traits varied between groups. The difference in the trait correlation was directly related to the number of items judged not to be invariant using traditional unidimensional IRT-based unsigned item invariance indexes; the higher trait correlation leads to higher discrimination parameter estimates when a unidimensional IRT model is fit to the multidimensional data. In the most extreme case, when rθ1 θ2= Ofor one group and r θ1 θ2= 1.0 for the other group, 33 out of 40 items were identified as not invariant. When using signed indexes, the effect was much smaller. The authors, therefore, suggest a cautious use of IRT-based item invariance indexes when data are potentially multidimensional and groups may vary in the strength of the correlations among traits.  相似文献   

The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on‐demand testing. Building on existing residual analyses, the authors propose an item index that requires only moderate‐to‐small sample sizes to form data for time‐series analysis. Asymptotic results are presented to facilitate statistical significance tests. The authors show that the proposed index combined with time‐series techniques may be useful in detecting and predicting item drift. Most important, this index is related to a well‐known differential item functioning analysis so that a meaningful effect size can be proposed for item drift detection.  相似文献   

