首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An empirical investigation of the effect of choice weight scoring on predictive validity and reliability. Choice weight scoring refers to the procedure whereby different weights may be assigned to all the options of an item. Four groups of subjects were included in the experiment. Weights derived from each group were used to score tests for another group in order to assess the cross-validity of the weighted scoring. In no case did the increments in reliability and validity due to the weighted scoring exceed .03.  相似文献   

2.
3.
4.
5.
Item options of shortened forms of the GRE Verbal and Quantitative tests were empirically weighted by two variants of a method originally attributed to Guttman (1941). When compared with formula scores, it was found that tests scored with the empirical weights were more reliable but less valid when correlated with undergraduate GPA. A factor analysis revealed large increases in variance accounted for by the first factor. It was suggested that the weighting procedures used tended to capitalize on omitting behavior which, although a highly reliable tendency, may be invalid.  相似文献   

6.
7.
Violations of four selected principles of writing multiple choice items were introduced into an undergraduate political science examination. Three of the four poor practices had no overall effect on test difficulty. A significant (α= .05) interaction effect between the poor practices and course achievement occurred for one of the four practices, with the poorer students generally gaining most from the poorly written items. KR 20 values were significantly lower for sets of items with the same flaws than for "good" versions of the items in three of four comparisons. The reductions in reliability were equivalent to those expected to result from shortening the test by 13 to 56 percent. Concurrent validity (correlation of experimental test scores with final examination scores) was significantly lower in two of four cases. The reductions in validity were equivalent to those expected to result from shortening the test by 56 to 83 percent.  相似文献   

8.
9.
The purpose of this study was to determine in what way Guttman weighting affected the internal consistency and intercorrelation of the suhtests of the Scholastic Aptitude Test. The tests were first scored with Guttman weights and then with conventional correction-for-guessing weights. The internal consistency of the tests increased markedly when Guttman weights were used. The correlation of the two verbal subtests increased somewhat when Guttman weights were used, but the correlation of the two mathematics subtests as well as the intercorrelation of all verbal and mathematics subtests decreased. Differences in the factor structure of the Guttman- and conventionally-weighted subtests were used to explain the result.  相似文献   

10.
11.
This study examines the influence of processing strategies, and the associated metacomponents that determine when to apply them, on the construct validity of a verbal reasoning test. Three strategies for solving verbal analogy items were examined: a rule-oriented strategy, an association strategy, and a partial rule strategy. Construct validity was studied in two separate stages: construct representation and nomothetic span. For construct representation, evidence was obtained that all three strategies, and their related metacomponents, are associated with performance on analogy items. For nomothetic span, the current study found that all three strategies contribute to individual differences in verbal reasoning and to the predictive validity of the test. The results of this study also point to the utility of metacomponents as constructs for describing and understanding test performance. Implications of the results for test development and theories of aptitude are elaborated.  相似文献   

12.
In an essay rating study multiple ratings may be obtained by having different raters judge essays or by having the same rater(s) repeat the judging of essays. An important question in the analysis of essay ratings is whether multiple ratings, however obtained, may be assumed to represent the same true scores. When different raters judge the same essays only once, it is impossible to answer this question. In this study 16 raters judged 105 essays on two occasions; hence, it was possible to test assumptions about true scores within the framework of linear structural equation models. It emerged that the ratings of a given rater on the two occasions represented the same true scores. However, the ratings of different raters did not represent the same true scores. The estimated intercorrelations of the true scores of different raters ranged from .415 to .910. Parameters of the best fitting model were used to compute coefficients of reliability, validity, and invalidity. The implications of these coefficients are discussed.  相似文献   

13.
14.
A comparision was made of the predictive efficiency of each of two tests in the diagnosis of reading failure over a period of from one to four years. A direct test of reading potential in the form of a word recognition test was shown generally to be more efficient than an indirect test based on neurophysiological indicants. The finding that self concept measures were not consistently related to reading performance was interpreted in terms of the biassing effect of a particular response style.  相似文献   

15.
16.
Currently, there are few strengths‐based preschool rating scales that sample a wide array of behaviors believed to be essential for early academic success. The purpose of this study was to assess the factor structure of a new measure of early academic competence for at‐risk preschool populations. The Teacher Rating Scales of Early Academic Competence (TRS‐EAC) includes two broad scales (Early Academic Skills and Early Academic Enablers) and was completed by 60 teachers for 440 children enrolled in Head Start and public preschool classrooms. Evidence from two exploratory factor analyses supported a five‐factor solution for the Early Academic Skills Scale (Creative Thinking, Critical Thinking Skills, Numeracy, Early Literacy, and Comprehension) and a five‐factor solution for the Early Academic Enablers Scale (Approaches to Learning, Social and Emotional Competence, Fine Motor Skills, Gross Motor Skills, and Communication). TRS‐EAC scores also demonstrated good to excellent reliability and were related to children's performance on direct measures of early academic skills.  相似文献   

17.
18.
Differential weighting of response alternatives and confidence testing have been proposed as ways to assess partial knowledge on multiple-choice tests. 211 students in an educational measurement course took their midterm examination under one of three procedures. Results from those students administered the test under conventional directions provided a baseline for comparing, in terms of reliability and validity, the results from students who took the test under the differential weighting of response alternatives or the confidence testing instructions. Reliability was estimated by the split-half technique. Validity was estimated by correlating midterm test scores with scores on a final examination. This investigation provides some support for the contention that validity can be improved using more sophisticated testing techniques. Suggestions for the conduct of more definitive studies were offered.  相似文献   

19.
20.
In order to attempt to assess aspects of clinical competence, not adequately assessed by other means, the Center for the Study of Medical Education, University of Illinois College of Medicine together with the American Board of Orthopaedic Surgery developed oral examinations in formats specifically designed to yield information on high level cognitive functioning. The examinations were administered to 784 candidates for certification in January 1968. Reliability of the oral problem-solving component score pooled from four examiners was approximately .50. Assessment of content, construct, and concurrent validity made by questionnaire and factor analytic studies indicated that the oral tests identified factors not measured by multiple-choice tests and, therefore, significantly improved the relationship between supervisory evaluations and test scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号