期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Application of the Bi-Factor Multidimensional Item Response Theory Model to Testlet-Based Tests

Christine E. DeMars 《Journal of Educational Measurement》2006,43(2):145-168

Four item response theory (IRT) models were compared using data from tests where multiple items were grouped into testlets focused on a common stimulus. In the bi-factor model each item was treated as a function of a primary trait plus a nuisance trait due to the testlet; in the testlet-effects model the slopes in the direction of the testlet traits were constrained within each testlet to be proportional to the slope in the direction of the primary trait; in the polytomous model the item scores were summed into a single score for each testlet; and in the independent-items model the testlet structure was ignored. Using the simulated data, reliability was overestimated somewhat by the independent-items model when the items were not independent within testlets. Under these nonindependent conditions, the independent-items model also yielded greater root mean square error (RMSE) for item difficulty and underestimated the item slopes. When the items within testlets were instead generated to be independent, the bi-factor model yielded somewhat higher RMSE in difficulty and slope. Similar differences between the models were illustrated with real data. 相似文献

2.

Empirical Estimates of the Comparative Reliability of Matching Tests and Multiple-Choice Tests

《Journal of Experimental Education》2012,80(3):179-182

Equivalent forms of a ten-item completion test were constructed. The same test items then were rewritten in matching format and in multiple-choice format, resulting in two forms (A and B) of each of three types of test. All tests were administered to 73 examinees, and parallel-forms reliability coefficients (correlation between scores on A and B) were calculated. These empirically obtained values were compared to the values of the reliability coefficient predicted from theoretically derived equations which indicate the influence of chance success due to guessing on test reliability. In accordance with theory it was found that the completion test was more reliable than the matching test and that the matching test was more reliable than the multiple-choice test. The empirically obtained reliability coefficients were very close to those predicted from the mathematically derived formulas. 相似文献

3.

提高英语主观型试题的测试可靠性

胡风明《唐山师范学院学报》2000,(6)

虽然客观型试题在测试学生的语言能力上有很强的优势,但在实践中它的负面影响也越来越明显。主观型试题能更有效地测量学生的语言表达能力和交际能力。主观型试题编题相对比较容易,但评分却很难。要合理分析造成主观型试题信度低的原因,提出解决的方法。相似文献

4.

Reliability of Scores From Teacher-Made Tests 总被引：1，自引：0，他引：1

David A. Frisbie 《Educational Measurement》1988,7(1):25-35

Reliability is the property of a set of test scores that indicates the amount of measurement error associated with the scores. Teachers need to know about reliability so that they can use test scores to make appropriate decisions about their students. The level of consistency of a set of scores can he estimated by using the methods of internal analysis to compute a reliability coefficient. This coefficient, which can range between 0.0 and +1.0, usually has values around 0.50 for teacher-made tests and around 0.90 for commercially prepared standardized tests. Its magnitude can be affected by such factors as test length, test-item difficulty and discrimination, time limits, and certain characteristics of the group—extent of their testwiseness, level of student motivation, and homogeneity in the ability measured by the test. 相似文献

5.

The Reliability of Tests Requiring Alternative Responses

《The Journal of educational research》2012,105(3):234-240

相似文献

6.

关于考试科学化问题

姜同《长江工程职业技术学院学报》2003,20(1):18-21

论述考试的功能和考试科学化在教育改革中的重大意义，介绍科学命题的原则和方法，列举常见题型并阐明它们各自的特点和命题技巧，强调对考试项目——试题信度、效度、难度、区分度进行分析的必要性并提供分析的方法。相似文献

7.

口试评分规范化与信度研究 总被引：2，自引：0，他引：2

郭茜邢如沈明波《清华大学教育研究》2003,(Z1)

口语考试的效度较高,信度却比较低。但没有信度,效度也不可能真正得到保证。因此,如何提高口试的信度,是很多测试研究者普遍关注的问题。本文通过描述清华大学英语水平考试中口试部分的评分规范化与评分员培训,对如何规范评分以提高口试信度这一问题进行讨论。相似文献

8.

The Cumulative Reliability of Frequent Short Objective Tests

《The Journal of educational research》2012,105(4-5):290-295

相似文献

9.

谈信度、效度与学业测试

包威《黑龙江教育学院学报》2010,29(8):29-30

信度与效度是学业测试的两个质量特征,如何处理两者之间的关系也是测试的根本问题。在介绍信度和效度的定义、关系的基础上,对学业测试中的信度与效度进行分析,并且阐述如何平衡两者之间的关系。最终证明学业测试是一种有效的测量手段,并且必将提高教学质量。相似文献

10.

How Days Between Tests Impacts Alternate Forms Reliability in Computerized Adaptive Tests

Adam E. Wyse 《Educational and psychological measurement》2021,81(4):644

An essential question when computing test–retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computerized adaptive tests to explore how the number of days between tests impacts alternate forms reliability coefficients. Results suggest that the highest alternate forms reliability coefficients were obtained when the second test was administered at least 2 to 3 weeks after the first test. Even though reliability coefficients after this amount of time were often similar, results suggested a potential tradeoff in waiting longer to retest as student ability tended to grow with time. These findings indicate that if keeping student ability similar is a concern that the best time to retest is shortly after 3 weeks have passed since the first test. Additional analyses suggested that alternate forms reliability coefficients were lower when tests were shorter and that narrowing the first test ability distribution of examinees also impacted estimates. Results did not appear to be largely impacted by differences in first test average ability, student demographics, or whether the student took the test under standard or extended time. It is suggested that for math and reading tests, like the ones analyzed in this article, the optimal retest interval would be shortly after 3 weeks have passed since the first test. 相似文献

11.

浅析黑盒测试与白盒测试

胡静《衡水学院学报》2008,10(1):30-32

黑盒测试和白盒测试都是软件测试的重要方法,黑盒测试的测试人员更偏重于业务方向,白盒测试的测试人员更偏重于实现方式;黑盒测试更注重整体,白盒测试更注重局部;它们是相辅相成的. 相似文献

12.

On Language Tests Construction

袁云博关丽娟《海外英语》2013,(7)

Language tests are closely concerned with teaching and learning. There are various ways to test student ’s foreign lan?guage level in light of different aspects. This paper mainly focuses on some key... 相似文献

13.

统计软件的可靠性论证

周坤乔《湖州师范学院学报》2004,26(1):108-110

在统计软件测试或软件维护过程中，利用抽样调查方法，通过数学模型，对软件可靠性的重要指标残留错误总数进行测定，从而认为软件质量是可以定量计算的，这有助于软件人员进行改进软件质量的工作。相似文献

14.

TEM4平行模拟测试信度及差异检验

张红霞王同顺《教育与现代化》2003,(4):23-29

对于全国性测试，经常性的评估是必不可少的。语言测试评估、有效性研究的关键是信度或一致性研究。本研究使用TEM4平行试卷，分别进行信度统计、差异分析。它不仅检验了平行测试之间的一致性问题，还在有差异的情况下，对有差异的测试或题项进行定位。这种定位对以后的测试编制、预测及拼卷将起到积极的作用。相似文献

15.

汉语发展性阅读障碍诊断方法探讨

金花何先友莫雷《华南师范大学学报(社会科学版)》2009,(5)

汉语发展性阅读障碍迄今还没有一套标准的、为国际学术界广泛认可的评估方法,这极大地影响了汉语发展性阅读障碍的理论研究与实践应用。通过对当前使用中的汉语发展性阅读障碍的评估方法进行梳理与分析,并在理解发展性阅读障碍一般定义的基础上,结合汉语认知的特异性,提出了“汉语发展性阅读障碍的诊断测试应包括单词解码能力测试和词义通达能力测试”的建议。相似文献

16.

论基于语料库的语法测试的重要性

赵有华聂龙《海南师范大学学报(社会科学版)》2005,18(3):138-140

重视对真实语言事实的研究一直是语言学研究的优良传统。就语法而言，它甚至可能改变我们对语法传统的、基本的看法，从而重新审视我们的语法观。文章通过多年来对四级大学英语试卷的分析，探讨基于语料库的语法测试的重要性。相似文献

17.

试论研究生招考分离制度 总被引：6，自引：2，他引：6

下载免费PDF全文

江莹《学位与研究生教育》2005,(8):36-40

回顾了招考分离的缘起与发展,分析了招考分离的法律依据与现实意义,剖析了招考分离的路径选择,指出了研究生招生改革的趋势是实行国家层面的招考分离,即全国统一的考务部门负责考试,招生单位负责招生,招生单位在全国统一考试的基础上,确定本单位的基本分数线,再结合学科专业特色对考生进行业务能力、综合素质的考核,优中选优。相似文献

18.

The Reliability and Validity of Quick Tests With High School Seniors

George W. Bohrnstedt Philip Lambert Edgar F. Borgatta 《Journal of Experimental Education》2013,81(4):22-23

The Quick Word Test (QWT), Quick Number Test (QNT), and a number of criterion verbal and numerical tests were related with the English and Math grade point average (GPA) scores in this study. The QWT, in general, had lower correlations with English GPA scores than the criterion tests. The correlations between the QNT and the Math GPA was approximately at the same level as the criterion measures. 相似文献

19.

On the Choice of Anchor Tests in Equating

Sandip Sinharay 《Educational Measurement》2018,37(2):64-69

The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested “miditests,” which are anchor tests that are content‐representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties. Sinharay and Holland (2006, 2007), Cho, Wall, Lee, and Harris (2010), Fitzpatrick and Skorupski (2016), Liu, Sinharay, Holland, Curley, and Feigenbaum (2011a), Liu, Sinharay, Holland, Feigenbaum, and Curley (2011b), and Yi (2009) found the miditests to lead to better equating than minitests, which are representative of the total test with respect to content and difficulty. However, these findings recently came into question as Trierweiler, Lewis, and Smith (2016) concluded, based on a comparison of correlation coefficients of miditests and minitests with the total test, that making an anchor test a miditest does not generally increase the anchor to total score correlation and recommended the continuation of the practice of using minitests over miditests. Their recommendation raises the question, “Should miditests continue to be considered in practice?” This note defends the miditests by citing literature that favors miditests and then by showing that miditests perform as well as the minitests in most realistic situations considered in Trierweiler et al. (2016), which implies that miditests should continue to be seriously considered by equating practitioners. 相似文献

20.

“四大考验”与执政安全

徐晨光《湖南师范大学社会科学学报》2012,41(4):51-58

在世情、国情、党情发生深刻变化的新形势下,我党面临着长期、复杂而严峻的执政考验、改革开放考验、市场经济考验、外部环境考验,这要求我们党必须能够接受这些挑战与考验,化解来自体制内外的各种危险,切实维护党的主体安全,动力安全、体制安全和环境安全,从根本上保证党的执政安全,不断巩固党的执政地位. 相似文献