首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We evaluated a computer-delivered response type for measuring quantitative skill. "Generating Examples" (GE) presents under-determined problems that can have many right answers. We administered two GE tests that differed in the manipulation of specific item features hypothesized to affect difficulty. Analyses related to internal consistency reliability, external relations, and features contributing to item difficulty, adverse impact, and examinee perceptions. Results showed that GE scores were reasonably reliable but only moderately related to the GRE quantitative section, suggesting the two tests might be tapping somewhat different skills. Item features that increased difficulty included asking examinees to supply more than one correct answer and to identify whether an item was solvable. Gender differences were similar to those found on the GRE quantitative and analytical test sections. Finally, examinees were divided on whether GE items were a fairer indicator of ability than multiple-choice items, but still overwhelmingly preferred to take the more conventional questions.  相似文献   

2.
The Formulating-Hypotheses (F-H) item presents a situation and asks examinees to generate as many explanations for it as possible. This study examined the generalizability, validity, and examinee perceptions of a computer-delivered version of the task. Eight F-H questions were administered to 192 graduate students. Half of the items restricted examinees to 7 words per explanation, and half allowed up to 15 words. Generalizability results showed high interrater agreement, with tests of between 2 and 4 items scored by one judge achieving coefficients in the .80s. Construct validity analyses found that F-H was only marginally related to the GRE General Test, and more strongly related than the General Test to a measure of ideational fluency. Different response limits tapped somewhat different abilities, with the 15-word constraint appearing more useful for graduate assessment. These items added significantly to conventional measures in explaining school performance and creative expression.  相似文献   

3.
Exploratory and confirmatory factor analyses were used to explore relationships among existing item types and three new computer–administered item types for the analytical scale of the Graduate Record Examination General Test. One new item type was an open–ended version of the current multiple–choice analytical reasoning item type. The other new item types had no counterparts on the existing test. The computer tests were administered at four sites to a sample of students who had previously taken the GRE General Test. Scores from the regular GRE and the special computer administration were matched for a sample of 349 students. Factor analyses suggested that the new item types with no counterparts in the existing GRE were reliably assessing unique constructs but the open–ended analytical reasoning items were not measuring anything beyond what is measured by the current multiple–choice version of these items.  相似文献   

4.
In actual test development practice, the number o f test items that must be developed and pretested is typically greater, and sometimes much greater, than the number that is eventually judged suitable for use in operational test forms. This has proven to be especially true for one item type–analytical reasoning-that currently forms the bulk of the analytical ability measure of the GRE General Test. This study involved coding the content characteristics of some 1,400 GRE analytical reasoning items. These characteristics were correlated with indices of item difficulty and discrimination. Several item characteristics were predictive of the difficulty of analytical reasoning items. Generally, these same variables also predicted item discrimination, but to a lesser degree. The results suggest several content characteristics that could be considered in extending the current specifications for analytical reasoning items. The use of these item features may also contribute to greater efficiency in developing such items. Finally, the influence of these various characteristics also provides a better understanding of the construct validity of the analytical reasoning item type.  相似文献   

5.
In this study we examined alternative item types and section configurations for improving the discriminant and convergent validity of the GRE General Test. A computer-based test of reasoning items and a generating-explanations measure was administered to a sample of 388 examinees who previously had taken the General Test. Confirmatory factor analyses indicated that three dimensions of reasoning—verbal, analytical, and quantitative—and a fourth dimension of verbal fluency based on the generating-explanations task could be distinguished. Notably, generating explanations was as distinct from new variations of reasoning items as it was from verbal and quantitative reasoning. In the full sample, this differentiation was evident in relation to such external criteria as undergraduate grade point average (UGPA), self-reported accomplishments, and a measure of ideational fluency, with generating explanations relating uniquely to aesthetic and linguistic accomplishments and to ideational fluency. For the subset of participants with undergraduate majors in the humanities and social sciences, generating explanations added to the relationship with UGPA over that contributed by the General Test.  相似文献   

6.
Some applicants for admission to graduate programs present Graduate Record Examinations (GRE) General Test scores that are several years old. Due to different experiences over time, older GRE verbal, quantitative, and analytical scores may no longer accurately reflect the current capabilities of the applicants. To provide evidence regarding the long-term stability of GRE scores, test-retest correlations and average change (net gain) in test performance were analyzed for GRE General Test repeaters classified by time between test administrations in intervals ranging from less than 6 months to 10 years or more. Findings regarding average changes in verbal and quantitative test performance for long-term repeaters (with 5 years or more between tests), generally, and by graduate major area, sex, and ethnicity, appeared to be consistent with a differential growth hypothesis: Long-term repeaters generally, and in all of the subgroups, registered greater average (net) score gain on verbal tests than on quantitative tests and, for subgroups, the amount of gain tended to vary directly with initial means. A rationale is presented for a growth interpretation of the observed average gains in test performance. Implications for graduate school and GRE Program policies regarding the treatment of older test scores are considered.  相似文献   

7.
Test preparation activities were determined for a large representative sample of Graduate Record Examination (GRE) Aptitude Test takers. About 3% of these examinees had attended formal coaching programs for one or more sections of the test.
After adjusting for differences in the background characteristics of coached and uncoached students, effects on test scores were related to the length and the type of programs offered. The effects on GRE verbal ability scores were not significantly related to the amount of coaching examinees received, and quantitative coaching effects increased slightly but not significantly with additional coaching. Effects on analytical ability scores, on the other hand, were related significantly to the length of coaching programs, through improved performance on two analytical item types, which have since been deleted from the test.
Overall, the data suggest that, when compared with the two highly susceptible item types that have been removed from the GRE Aptitude Test, the test item types in the current version of the test (now called the GRE General Test) appear to show relatively little susceptibility to formal coaching experiences of the kinds considered here.  相似文献   

8.
GRE一般测验与GMAT定量能力的考查内容均为初等数学,主要强调解决日常生活工作中遇到的数量问题。但因二者的适用范围有所不同,其能力考查存在一定差异,GMAT对数学逻辑思维的考查比较深入,而GRE一般测验对数学概念的准确性以及思维的全面性有较多考查。在我国硕士研究生入学考试中设置一般能力测试数学部分的考查应根据就读研究生学科专业对数学的不同要求及考生的实际能力和水平设置相应考试内容,试题情境设置应联系实际,贴近生活,试题难度不宜太大,以60%的考生通过资格线来控制试题难度。  相似文献   

9.
This study evaluated 16 hypotheses, subsumed under 7 more general hypotheses, concerning possible sources of bias in test items for black and white examinees on the Graduate Record Examination General Test (GRE). Items were developed in pairs that were varied according to a particular hypothesis, with each item from a pair administered in different forms of an experimental portion of the GRE. Data were analyzed using log linear methods. Ten of the 16 hypotheses showed interactions between group membership and the item version indicating a differential effect of the item manipulation on the performance of black and white examinees. The complexity of some of the interactions found, however, suggested that uncontrolled factors were also differentially affecting performance.  相似文献   

10.
In this study, we created a computer-delivered problem-solving task based on the cognitive research literature and investigated its validity for graduate admissions assessment. The task asked examinees to sort mathematical word problem stems according to prototypes. Data analyses focused on the meaning of sorting scores and examinee perceptions of the task. Results showed that those who sorted well tended to have higher GRE General Test scores and college grades than did examinees who sorted less proficiently. Examinees generally preferred this task to multiple-choice items like those found on the General Test's Quantitative section and felt the task was a fairer measure of their ability to succeed in graduate school. Adaptations of the task might be used in admissions tests, as well as for instructional assessments to help lower- scoring examinees localize and remediate problem-solving difficulties.  相似文献   

11.
2005年GRE委员会提出的新的GRE常规考试,将更强调与研究工作紧密相关的复杂推理能力和高级理解能力,将采用更先进网络技术,更有效、更公平地考查考生能力,这将进一步保证GRE考试在保障研究生生源质量上发挥独特的作用。通过介绍和分析GRE考试改革后的特点,从中得出我国在今后改革研究生入学考试时值得借鉴的经验。  相似文献   

12.
In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored.  相似文献   

13.
The first generation of computer-based tests depends largely on multiple-choice items and constructed-response questions that can be scored through literal matches with a key. This study evaluated scoring accuracy and item functioning for an open-ended response type where correct answers, posed as mathematical expressions, can take many different surface forms. Items were administered to 1,864 participants in field trials of a new admissions test for quantitatively oriented graduate programs. Results showed automatic scoring to approximate the accuracy of multiple-choice scanning, with all processing errors stemming from examinees improperly entering responses. In addition, the items functioned similarly in difficulty, item-total relations, and male-female performance differences to other response types being considered for the measure.  相似文献   

14.
This study examined the relationship between 403 counseling graduate students' scores on the Counselor Preparation Comprehensive Examination (CPCE; Center for Credentialing and Education, n.d.) and 3 admissions requirements used as predictor variables: undergraduate grade point average (UGPA), Graduate Record Examinations (GRE) General Test Verbal Reasoning (GRE‐V) score, and GRE General Test Quantitative Reasoning (GRE‐Q) score. Multiple regression analyses revealed that all predictor variables accounted for somewhat limited, yet significant variations in the CPCE‐Total scores (R2 = .21). Results indicated that UGPAs, GRE‐V scores, and GRE‐Q scores are valid criteria for determining counseling graduate student success on the CPCE.  相似文献   

15.
The purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear.  相似文献   

16.
The use and validity of the Graduate Record Examination General Test (GRE) to predict the success of graduate school applicants is heavily debated, especially for its possible impact on the selection of underrepresented minorities into science, technology, engineering, and math fields. To better identify candidates who would succeed in our program with less reliance on the GRE and grade point average (GPA), we developed and tested a composite score (CS) that incorporates additional measurable predictors of success to evaluate incoming applicants. Uniform numerical values were assigned to GPA, GRE, research experience, advanced course work or degrees, presentations, and publications. We compared the CS of our students with their achievement of program goals and graduate school outcomes. The average CS was significantly higher in those students completing the graduate program versus dropouts (p < 0.002) and correlated with success in competing for fellowships and a shorter time to thesis defense. In contrast, these outcomes were not predicted by GPA, science GPA, or GRE. Recent implementation of an impromptu writing assessment during the interview suggests the CS can be improved further. We conclude that the CS provides a broader quantitative measure that better predicts success of students in our program and allows improved evaluation and selection of the most promising candidates.  相似文献   

17.
This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low‐scoring students based on patterns of mistakes, and a reading efficiency index. Instead of two response types for each multiple‐choice item, correct and incorrect, each item has three response types: correct and two incorrect response types. Prior results on reliability, convergent and discriminant validity, and predictive utility of mistake subscores are briefly described. The three‐response‐type structure of items required rethinking the item response theory (IRT) modeling. IRT‐modeling results are presented, and implications for formative assessments and instructional use are discussed.  相似文献   

18.
Open–ended counterparts to a set of items from the quantitative section of the Graduate Record Examination (GRE–Q) were developed. Examinees responded to these items by gridding a numerical answer on a machine-readable answer sheet or by typing on a computer. The test section with the special answer sheets was administered at the end of a regular GRE administration. Test forms were spiraled so that random groups received either the grid-in questions or the same questions in a multiple-choice format. In a separate data collection effort, 364 paid volunteers who had recently taken the GRE used a computer keyboard to enter answers to the same set of questions. Despite substantial format differences noted for individual items, total scores for the multiple-choice and open-ended tests demonstrated remarkably similar correlational patterns. There were no significant interactions of test format with either gender or ethnicity.  相似文献   

19.
Interpreting and creating graphs plays a critical role in scientific practice. The K-12 Next Generation Science Standards call for students to use graphs for scientific modeling, reasoning, and communication. To measure progress on this dimension, we need valid and reliable measures of graph understanding in science. In this research, we designed items to measure graph comprehension, critique, and construction and developed scoring rubrics based on the knowledge integration (KI) framework. We administered the items to over 460 middle school students. We found that the items formed a coherent scale and had good reliability using both item response theory and classical test theory. The KI scoring rubric showed that most students had difficulty linking graphs features to science concepts, especially when asked to critique or construct graphs. In addition, students with limited access to computers as well as those who speak a language other than English at home have less integrated understanding than others. These findings point to the need to increase the integration of graphing into science instruction. The results suggest directions for further research leading to comprehensive assessments of graph understanding.  相似文献   

20.
A previous study of the initial, preoperational version of the Graduate Record Examinations (GRE) analytical ability measure (Powers & Swinton, 1984) revealed practically and statistically significant effects of test familiarization on analytical test scores. (Two susceptible item types were subsequently removed from the test.) Data from this study were reanalyzed for evidence of differential effects for subgroups of examinees classified by age, ethnicity, degree aspiration, English language dominance, and performance on other sections of the GRE General Test. The results suggested little, if any, difference among subgroups of examinees with respect to their response to the particular kind of test preparation considered in the study. Within the limits of the data, no particular subgroup appeared to benefit significantly more or significantly less than any other subgroup.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号