首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In classical test theory, a test is regarded as a sample of items from a domain defined by generating rules or by content, process, and format specifications, l f the items are a random sample of the domain, then the percent-correct score on the test estimates the domain score, that is, the expected percent correct for all items in the domain. When the domain is represented by a large set of calibrated items, as in item banking applications, item response theory (IRT) provides an alternative estimator of the domain score by transformation of the IRT scale score on the test. This estimator has the advantage of not requiring the test items to be a random sample of the domain, and of having a simple standard error. We present here resampling results in real data demonstrating for uni- and multidimensional models that the IRT estimator is also a more accurate predictor of the domain score than is the classical percent-correct score. These results have implications for reporting outcomes of educational qualification testing and assessment.  相似文献   

Reliability of Scores From Teacher-Made Tests   总被引:1,自引:0,他引:1  
Reliability is the property of a set of test scores that indicates the amount of measurement error associated with the scores. Teachers need to know about reliability so that they can use test scores to make appropriate decisions about their students. The level of consistency of a set of scores can he estimated by using the methods of internal analysis to compute a reliability coefficient. This coefficient, which can range between 0.0 and +1.0, usually has values around 0.50 for teacher-made tests and around 0.90 for commercially prepared standardized tests. Its magnitude can be affected by such factors as test length, test-item difficulty and discrimination, time limits, and certain characteristics of the group—extent of their testwiseness, level of student motivation, and homogeneity in the ability measured by the test.  相似文献   

This article treats various procedures for examining the reliability of group mean difference scores, with particular emphasis on procedures from univariate and multivariate generalizability theory. Attention is given to both traditional norm-referenced perspectives on reliability as well as criterion-referenced perspectives that focus on error-tolerance ratios and functions of them. The procedures discussed are illustrated using three cohorts of data for third- and fourth-grade students in Iowa who took the Iowa Tests of Basic Skills in recent years. For these data, estimates of reliability for norm-referenced decisions tend to be relatively low. By contrast, for criterion-referenced decisions, estimates of reliability-like coefficients based on error-tolerance ratios tend to be noticeably larger.  相似文献   

The scores on 2 distinct tests (e.g., essay and objective) are often combined to create a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to an external criterion. However, in cases where no criterion is available, the observed composite has generally been evaluated in terms of its reliability. The analyses in this article are based on a simple, content-based model for the validity of the observed composite as an estimate of a target composite, based on a priori weights for the 2 tests. The results suggest that giving extra weight to the more reliable of the 2 observed scores tends to improve the reliability of the composite, and up to a point tends to improve its validity. Giving too much weight to the more reliable score can decrease the validity of the observed composite as a measure of the target composite.  相似文献   

对1989-2008年国内发表的有关明尼苏达多相人格测验(MMPI)的文章进行信度概化研究.对MMPI的10个临床量表和3个效度量表信度系数的报告情况、信度水平和变异性进行描述性分析;以样本类型、样本量等作为预测变量,探讨影响MMPI量表信度水平的因素.在此基础上,与国外关于MMPI的信度概化研究结果进行比较,结果表明二者在信度水平、信度系数的变异性及其预测源方面都存在异同.  相似文献   

This study was an investigation of the relation between the reliability of difference scores, considered as a parameter characterizing a population of examinees, and the reliability estimates obtained from random samples from the population. The parameters in familiar equations for the reliability of difference scores were redefined in such a way that determinants of reliability in both populations and samples become more transparent. Computer simulation was used to find sample values and to plot frequency distributions of various correlations and variance ratios relevant to the reliability of differences. The shape of frequency distributions resulting from the simulations and the means and standard deviations of these distributions reveal the extent to which reliability estimates based on sample data can be expected to meaningfully represent population reliability.  相似文献   

估计回归参数的最基本最重要的方法是最小二乘估计,但在一些情况下,主成分估计可能是一个更好的估计方法,本文就这两种方法进行了分析比较.  相似文献   

以生物科学专业的25位本科生15门课程的成绩为样本,应用DPS数据处理软件进行主成分分析,对专业课成绩的众多影响因素进行筛选,认为高等数学、有机化学、动物学、植物学和生物化学等课程成绩是影响专业课成绩的主要因素,最能代表该专业学生的总体专业素质,同时给出了学生综合排名的方法。  相似文献   

To assess the effects of logical support, the Smith, Meux, Coombs, and Nuthall (12) system for the analysis of teaching strategies was used to construct four passages for each of two topics, fluoridation and the use of pesticides. The four passages varied in degree and kind of support or justification for the negative value judgments used. The passages were administered to 303 eleventh grade students. The group which received the passages having the most support for the negative value judgments reacted negatively as much or more than the other three groups, while the group which received the passages having the least support reacted negatively as little or less than the other three groups. There was also a topic-by-passage interaction.  相似文献   

Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly drawn from a undifferentiated universe of items, and therefore might not be suitable for tests developed according to a table of specifications. To address this issue, four interval estimation procedures that use category subscores for the computation of confidence intervals are presented in this article. All four estimation procedures assume that subscores instead of test scores follow a binomial distribution (i.e., compound binomial error model). The relative performance of the four compound binomial–based interval estimation procedures is compared to each other and to the better known normal approximation and Wilson score procedures based on the binomial error model.  相似文献   

What are the practical implications of small decreases in reliability coefficients? How does increased item local dependence decrease reliability? How does the new format of more “authentic” reading tests affect reliability?  相似文献   

A structural equation modeling based method is outlined that accomplishes interval estimation of individual optimal scores resulting from multiple-component measuring instruments evaluating single underlying latent dimensions. The procedure capitalizes on the linear combination of a prespecified set of measures that is associated with maximal reliability and validity. The approach is useful when one is interested in evaluating plausible ranges for subject scores on the composite exhibiting highest measurement consistency and strongest linear relation with a given criterion. The method is illustrated with a numerical example.  相似文献   

随机抽取历年高考学生样本,分析城乡学生在高考成绩上存在的差异及发展趋势。显示城镇学生在文(理)科总分,总分平均分以上的人数比例,分数的集中度等方面均明显高于农村学生;新课程改革前后的城乡差距除语文学科有所缩小外,其他学科和文(理)科总分的变化不明显,但新课程改革后的城乡差距除语文学科相对稳定外,其他学科均存在扩大的趋势,并且在近二年来达到了最高水平;相比于文科生的城乡差异,理科生的差异更大。  相似文献   

Weibull分布场合无失效数据的Bayes估计   总被引:1,自引:0,他引:1  
Weibull分布场合无失效数据问题的诸pi = P(T≤ti) 的估计是其可靠性分析的关键,本文提出不完全的Beta 分布作为pi 的先验分布来对诸pi 作Bayes估计.  相似文献   

本文运用Bayes及多层Bayes方法对具有冷贮备部件串联系统的失效率,可靠度与平均失效时间进行了估计,最后通过随机模拟对这两种估计进行了比较分析.  相似文献   

现今的翻译已经不仅仅是语言上的迁移,而且还是文化的交流。根据英语习语所在的具体语境,在文化层次上来探讨英语习语的翻译;论述了习语中所反映的文化差异并总结了英语习语的翻译方法。强调最大限度实现译入语和源语在文化功能上的等值,实现中西文化的交流和融合.  相似文献   

It is widely recognized that the reliability of a difference score depends on the reliabilities of the constituent scores and their intercorrelation. Authors often use a well-known identity to express the reliability of a difference as a function of the reliabilities of the components, assuming that the intercorrelation remains constant. This approach is misleading, because the familiar formula is a composite function in which the correlation between components is a function of reliability. An alternative formula, containing the correlation between true scores instead of the correlation between observed scores, provides more useful information and yields values that are not quite as anomalous as the ones usually obtained  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号