首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
This study was designed to examine the level of dependence within multiple true-false (MTF) test item clusters by computing sets of item intercorrelations with data from a test composed of both MTF and multiple choice (MC) items. It was posited that internal analysis reliability estimates for MTF tests would be spurious due to elevated MTF within-cluster intercorrelations. Results showed that, on the average, MTF within-cluster dependence was no greater than that found between MTF items from different clusters, between MC items, or between MC and MTF items. But item for item, there was greater dependence between items within the same cluster than between items of different clusters.  相似文献   

To assess the concurrent validity of standardized achievement tests using teachers' ratings (and rankings) of pupils' academic achievement as criteria, 42 teachers evaluated each of their students (n = 1,032) in each of five major curricular areas prior to the administration of a battery of standardized achievement tests. The teachers were directed to rate each student's proficiency disregarding attendance, attitude, deportment, and so on. Within-class correlation coefficients were computed to eliminate rater leniency bias. The standardized achievement tests were found to have substantial concurrent validity in reading, math, language arts, science, and social studies. The normalized teacher ranks yielded significantly higher validity coefficients than did the ratings, although the magnitude of the difference was small. The concurrent validity coefficients for language arts, reading, and math were significantly higher than those in science and social studies.  相似文献   

Scores were obtained from 198 ninth grade students on achievement motivation, test anxiety, testwiseness, and risktaking. Tests in mathematics and vocabulary were constructed in free response and multiple choice form, and administered to the subjects in that order, with an interval of 5 weeks between administrations. Partial correlations were computed between scores on the multiple choice tests and achievement motivation, test anxiety, testwiseness, and risktaking, with free response scores partialled out. The partial correlations were corrected for the unreliability in the free response scores, and tested for significance. All partials involving achievement motivation and test anxiety were nonsignificant, as were all partials based on mathematics scores. The partial correlations of vocabulary scores with testwiseness and risktaking were significant without exception. It was concluded that the use of multiple choice tests can favour certain examinees those who are highly testwise and willing to take risks in the test situation. It was noted that the extent to which these examinees were favoured was dependent on the nature of the test, and that a verbal test seemed more susceptible than a numerical test.  相似文献   

口语报告是了解人类认知过程的重要方法。口语报告方法又称为出声思考方法,它能使被试的思维过程外部语言化,研究者以此可以直接研究人类复杂的信息加工过程。笔者介绍了口语报告方法的使用程序和国内外的有关口语报告方法的应用研究,分析了口语报告方法的发展趋势以及应用前景。  相似文献   

西部大开发 ,一退三还 ,使得本来粮食短缺、经济落后的西部地区粮食安全雪上加霜 ,进而成为中国粮食安全的隐患。本文认为解决西部粮食安全的思路在于走区域大循环之路 ,并从粮食流通体制、提高粮食自给率、增加稳定的供应和提高西部农民购买力等方面采取对策 ,以实现西部和全国粮食安全的良性循环和可持续发展。  相似文献   

In an essay rating study multiple ratings may be obtained by having different raters judge essays or by having the same rater(s) repeat the judging of essays. An important question in the analysis of essay ratings is whether multiple ratings, however obtained, may be assumed to represent the same true scores. When different raters judge the same essays only once, it is impossible to answer this question. In this study 16 raters judged 105 essays on two occasions; hence, it was possible to test assumptions about true scores within the framework of linear structural equation models. It emerged that the ratings of a given rater on the two occasions represented the same true scores. However, the ratings of different raters did not represent the same true scores. The estimated intercorrelations of the true scores of different raters ranged from .415 to .910. Parameters of the best fitting model were used to compute coefficients of reliability, validity, and invalidity. The implications of these coefficients are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号