首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Attempts to fake a good impression on the POI typically result in lower scores for subjects unknowledgeable in self-actualization, and in higher scores for knowledgable subjects. POI fakahility during counselor selection procedures was investigated through two test administrations with a group of 21 new graduate students in counseling. Subjects were not given self-actualization information. Scores under counselor selection fake-set procedures were significantly higher (p <. 05) on 4 of the 12 scales than scores obtained under standard testing instructions. The results imply new counseling students have sufficient information about self-actualization to dissimulate POI scores.  相似文献   

2.
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the existing methods were designed to detect drifts in individual items, which may not be adequate for test characteristic curve–based linking or equating. One example is the item response theory–based true score equating, whose goal is to generate a conversion table to relate number‐correct scores on two forms based on their test characteristic curves. This article introduces a stepwise test characteristic curve method to detect item parameter drift iteratively based on test characteristic curves without needing to set any predetermined critical values. Comparisons are made between the proposed method and two existing methods under the three‐parameter logistic item response model through simulation and real data analysis. Results show that the proposed method produces a small difference in test characteristic curves between administrations, an accurate conversion table, and a good classification of drifted and nondrifted items and at the same time keeps a large amount of linking items.  相似文献   

3.
Each year thousands of children are evaluated or reevaluated utilizing the current edition of the Wechsler Intelligence Scale to determine their eligibility for gifted programs. The Wechsler Intelligence Scale for Children-III (1991) is new enough that only limited research is available on how it compares to the previously used Wechsler Intelligence Scale for Children-Revised (1974). The purpose of this study was to determine the comparability between the previously dominant intelligence scale, the WISC-R, and the revised WISC-III with gifted children. The results of this study indicate that the latest revision (WISC-III) and the earlier version (WISC-R) produce remarkably similar scale and subtest scores when administered under clinical conditions to gifted children. All 51 children determined eligible through the administration of one of these two Wechsler tests would have been eligible for services had the other test been administered. The Verbal and Performance scale IQ scores were within two points of each other across the two test administrations, while only a one-point difference existed between the Full Scale IQ scores. The Arithmetic, Comprehension, and Object Assembly subtest scores were in high agreement across the two administrations (p<.01). The level of agreement between some subtests across the two administrations suggests that clinical judgment is just as important as scores in considering who is eligible for gifted programs.  相似文献   

4.
The hypothesis that some students, when tested under formula directions, omit items about which they have useful partial knowledge implies that such directions are not as fair as rights directions, especially to those students who are less inclined to guess. This hypothesis may be called the differential effects hypothesis. An alternative hypothesis states that examinees would perform no better than chance expectation on items that they would omit under formula directions but would answer under rights directions. This may be called the invariance hypothesis. Experimental data on this question were obtained by conducting special test administrations of College Board SAT-verbal and Chemistry tests and by including experimental tests in a Graduate Management Admission Test administration. The data provide a basis for evaluating the two hypotheses and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.  相似文献   

5.
This paper assesses the validity of a comparison of mean test scores for two groups of students, and of a longitudinal comparison of means within each group. Using LISREL, confirmatory factor analyses were carried out (a) to test the hypotheses of similar factor patterns, equal units of measurement, and equal accuracy of measurement between the two groups, and (b) to estimate the correlation between the latent traits measured by two successive test administrations in each group. The results indicate (a) that a comparison of the group means may be invalid because, although the factor pattern was the same for both groups, the factors were not measured in the same units, and (b) that longitudinal comparisons within each group are seriously complicated by evidence of structural change.  相似文献   

6.
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the item's functioning is ignored. In earlier work, we developed the Bayesian updating (BU) DIF procedure for dichotomous items and showed how it could be used to formally aggregate DIF results over administrations. More recently, we extended the BU method to the case of polytomously scored items. We conducted an extensive simulation study that included four “administrations” of a test. For the single‐administration case, we compared the Bayesian approach to an existing polytomous‐DIF procedure. For the multiple‐administration case, we compared BU to two non‐Bayesian methods of aggregating the polytomous‐DIF results over administrations. We concluded that both the BU approach and a simple non‐Bayesian method show promise as methods of aggregating polytomous DIF results over administrations.  相似文献   

7.
8.
Summary Scales measuring attitude toward a course were administered five times at equally spaced intervals throughout the course to college students. The students were in courses taught by one of four methods of instruction; programed, television, small class, and large class. The mean scores on the attitude scales differed significantly among the methods of instruction on each of the five administrations. The means of the methods were consistently ordered as follows: programed instruction television instruction > small class > large class. There was also a consistent decline in the mean scores over the five administrations. Novelty of the method for the students was offered as the variable differentiating the methods associated with the attitudinal differences. Other hypotheses were also discussed. The research reported here was one of several projects performed pursuant to a contract with the Office of Education, United States Department of Health, Education, and Welfare.  相似文献   

9.
This study examined administration method (standard written administration vs. oral administration by an examiner) as a variable in influencing children's self-report test scores. Subjects included 139 students in grades 3–6, randomly assigned to one or the other administration condition. Subjects completed the Internalizing Disorders Evaluation Scale for Children (IDESC) according to the assigned administration method. Internal consistency estimates of each group were essentially similar. Mean IDESC scores of the two groups did not differ significantly from either a statistical or practical standpoint, based on t-test and effect size calculations. Results suggest that method of administration did not affect test performance. Implications for child assessment and future research are discussed.  相似文献   

10.
Accommodation policymaking and practice should be guided by empirical research and informed clinical judgment. Findings from our study can provide information to test users about the validity of inferences that can be made from scores obtained from accommodated test administrations for students with disabilities. The factor structure of the newly revised Scholastic Aptitude Reasoning Test (SAT®, 2005) was examined across two groups of students (students without disabilities tested under standard time conditions, and students with disabilities tested with extended time) to determine whether the test measures the same construct for both groups. Invariance across the two groups was supported for all parameters of interest, suggesting that the scores on the Critical Reading, Math, and Writing sections of the SAT Reasoning Test can be interpreted in the same way when students have an extended‐time administration as opposed to the standard‐time administration.  相似文献   

11.
ABSTRACT

This study investigated how studying a refutational map, a type of argument map, affected conceptual change. Refutational maps visually display both correct and alternative conceptions. Participants (N?=?120) were randomly assigned to (1) a refutational map condition, (2) a refutational text condition, and (3) a non-refutational text condition. The post-test results showed that studying the refutational map led to better performance on free recall and learning transfer measures. Specifically, participants who studied the refutational map performed significantly better than others on a free recall test, and they significantly outperformed the non-refutational text group on a short-answer transfer test. The multiple-choice test, another transfer measure, failed to detect any differences among the three groups. The research also found that individual differences in need for cognition and logical thinking ability interacted with the type of study materials. Participants scoring lower on logical thinking ability gained more from studying the refutational map.  相似文献   

12.
This study examines the predictive validity of three commonly used nursing school admission indices, that is, scholastic aptitude test scores, matriculation grades, and evaluations of performance in a group interview situation, in a sample of 321 Israeli nursing school students. Grade point average, supervisor evaluation of clinical internship, and scores on a government certification exam served as primary indices of criterion performance. Whereas composite aptitude test scores correlated moder ately with both grade point average and certification exam scores, matriculation grades correlated negligibly with all three criterion measures. Group interview ratings correlated moderately with clinical performance, but negligibly with the remaining criteria. Aptitude test scores were not found to be biased predictors of criterion performance by ethnicity or social background. The implications of these findings for the selection of nursing school candidates in Israel are discussed.  相似文献   

13.
So as to compare the results of the WISC and WISC-R, both instruments were administered to 58 children randomly selected from a school population of 583. All administration and scoring was performed by the same psychologist, with a two-month interval separating the administrations for each child. All IQs were significantly higher (p <.01) on the WISC, with the Performance difference being greater than the verbal difference. Also, 8 of the 10 required subtest scaled scores were significantly greater (p <.05) on the older instrument. Regression equations were obtained to predict WISC-R IQs from WISC scores.  相似文献   

14.
The present investigation compared the PPVT-R/WISC-R scores of a “normal” or “nonexceptional population,” as well as whether prior administration of either of these instruments affected scores on the other. Forty public school second-grade students served as subjects and were randomly assigned to one of four groups, with the order of test administrations determined by group assignment: WISC-R/PPVT-R (Form L): WISC-R/PPVT-R (Form M); PPVT-R (Form L)/WISC-R; PPVT-R (Form M)/WISC-R. The results indecate that, as with exceptional populations, normal school children tend to score lower on the PPVT-R than on the WISC-R. Scores from these two tests are moderately correlated, and prior adminstration of one of the instruments does not appear to alter scores on the other. Implications for practice are discussed.  相似文献   

15.
This study addresses the need for systematic longitudinal research documenting the stability of WISC-R scores in special education populations. WISC-R scores of 100 learning-disabled and 60 mildly retarded children retested on three separate occasions at three-year intervals were examined. The stability of WISC-R scores was evaluated according to three different criteria: (a) the consistency of group means over time, (b) the frequency of significant changes in individual scores, and (c) correlations between administrations as an index of stability of subjects' relative positions in the group. Different results were obtained depending on the criterion considered. Examination of group means and correlation coefficients indicated that Full Scale IQ was fairly stable over a period of six years for both learning-disabled and mildly retarded samples. However, greater variability was noted when examining the frequency of changes in individual subject's scores. Verbal IQ and Performance IQ demonstrated somewhat more variability by all criteria examined. The implications of these results with regard to the importance assigned to IQ in special education classification decisions, the usefulness of retesting IQ in three-year reevaluations, and the efficacy of special education are discussed.  相似文献   

16.
It has been suggested that the primary purpose for criterion-referenced testing in objective-based instructional programs is to classify examinees into mastery states or categories on the objectives included in the test. We have proposed that the reliability of the criterion-referenced test scores be defined in terms of the consistency of the decision-making process across repeated administrations of the test. Specifically, reliability is defined as a measure of agreement over and above that which can be expected by chance between the decisions made about examinee mastery states in repeated test administrations for each objective measured by the criterion-referenced test.  相似文献   

17.
《Educational Assessment》2013,18(2):111-133
This article briefly reviews the current discussion of the effects of test administration conditions (i.e., testing stakes), and the motivational levels associated with them, on achievement test performance. The non-experimental study presented here investigates whether differences in test administration conditions and presumed levels of motivation engendered by different testing environments affect student performance on National Assessment of Educational Progress (NAEP) administrations. The testing conditions under study are the "low-stakes" environment of the current NAG administration and a higher stakes environment typified by many state assessment programs. The results suggest that in comparison to a "moderate-stakes" testing environment NAEP does not seriously underestimate achievement levels. However, the results cannot lead to the conclusion that student achievement is unrelated to testing stakes. Nor can one conclude that substantially raising the stakes of NAEP would not be accompanied by an increase in achievement scores.  相似文献   

18.
This paper presents an assessment of the effects which a brief training program had on teaching effectiveness of graduate teaching assistants (TAs). Twenty-two inexperienced and previously untrained university TAs from economics, geography, and business administration were assigned to a training or control group by a stratified random method with stratification based on TA departmental affiliation. Teaching experts rated two videotapes of each Ta's university class, one tape made before training and one following training. Ratings were obtained on two factors: (1) planning instruction to meet clear goals and organizing meaningful content in a logical fashion, and (2) involving students in instruction. Results from analyses of covariance indicate that the training group received significantly higher final ratings than the control group on the total score and on each of the two factors when final scores were adjusted for group differences in initial ratings. Teaching experience alone did not result in significantly higher ratings for control group TAs. Participants in training evaluated most topics and the overall program favorably both immediately after training and one semester later.  相似文献   

19.
Scores were obtained from 198 ninth grade students on achievement motivation, test anxiety, testwiseness, and risktaking. Tests in mathematics and vocabulary were constructed in free response and multiple choice form, and administered to the subjects in that order, with an interval of 5 weeks between administrations. Partial correlations were computed between scores on the multiple choice tests and achievement motivation, test anxiety, testwiseness, and risktaking, with free response scores partialled out. The partial correlations were corrected for the unreliability in the free response scores, and tested for significance. All partials involving achievement motivation and test anxiety were nonsignificant, as were all partials based on mathematics scores. The partial correlations of vocabulary scores with testwiseness and risktaking were significant without exception. It was concluded that the use of multiple choice tests can favour certain examinees those who are highly testwise and willing to take risks in the test situation. It was noted that the extent to which these examinees were favoured was dependent on the nature of the test, and that a verbal test seemed more susceptible than a numerical test.  相似文献   

20.
In many life science classrooms, instructors rely upon lecture presentations to efficiently present course content. Students, in this case, act as passive learners with little opportunity to test their knowledge for gaps or misconceptions. The goal of the project described here was to determine whether a collaborative quiz protocol that guided students to discuss their understanding with their peers would improve learning and academic performance. The project took place during a single semester and was composed of two studies: a preliminary study that incorporated short-answer quizzes into the curriculum and a comprehensive study that incorporated short-answer quizzes and justify/explain quizzes in which students were expected to select an answer and then justify or explain it. Students took all quizzes twice, first independently and then collaboratively with classmate(s). Learning was assessed using multiple-choice exam questions based upon quiz topics. Students scored significantly higher on exam questions associated with justify/explain quiz topics than on those associated with short-answer quiz topics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号