期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

WHICH EXAMINEES ARE MOST FAVOURED BY THE USE OF MULTIPLE CHOICE TESTS?

GLENN L. ROWLEY 《Journal of Educational Measurement》1974,11(1):15-23

Scores were obtained from 198 ninth grade students on achievement motivation, test anxiety, testwiseness, and risktaking. Tests in mathematics and vocabulary were constructed in free response and multiple choice form, and administered to the subjects in that order, with an interval of 5 weeks between administrations. Partial correlations were computed between scores on the multiple choice tests and achievement motivation, test anxiety, testwiseness, and risktaking, with free response scores partialled out. The partial correlations were corrected for the unreliability in the free response scores, and tested for significance. All partials involving achievement motivation and test anxiety were nonsignificant, as were all partials based on mathematics scores. The partial correlations of vocabulary scores with testwiseness and risktaking were significant without exception. It was concluded that the use of multiple choice tests can favour certain examinees those who are highly testwise and willing to take risks in the test situation. It was noted that the extent to which these examinees were favoured was dependent on the nature of the test, and that a verbal test seemed more susceptible than a numerical test. 相似文献

2.

A COMPARISON OF THE VALIDITIES OF CONVENTIONAL CHOICE TESTING AND VARIOUS CONFIDENCE MARKING PROCEDURES

ROGER A. KOEHLER 《Journal of Educational Measurement》1971,8(4):297-303

This study compared the convergent and discriminant validity of two confidence marking techniques with that of conventional choice testing. Achievement in vocabulary, social studies, and science (traits) was measured by a 60-item test containing true-false and 5-alternative items (methods). The test was administered to three randomly assigned groups (one for each response system), totaling 535 Ss. The results indicated very slight differences in convergent and discriminant validity that favored conventional choice testing over confidence marking techniques. 相似文献

3.

Examining high-school students’ overconfidence bias in biology exam: a focus on the effects of country and gender

Arif Rachmatullah 《International Journal of Science Education》2019,41(5):652-673

Accurate, rational, and scientific decision making is now considered to be the most important skill in science education. Many studies have found that overconfidence bias is one of the cognitive biases hindering people from achieving such decision making. Gender and country play crucial roles in overconfidence bias. For instance, some particular cultures and genders tend to be more overconfident than others. However, whether or not the two variables interact to influence overconfidence bias also indirectly influences decision making, especially in the context of science education. The purpose of this study is to identify the effects of country and gender on performance, confidence, and overconfidence bias in the samples of Indonesian and Korean high-school students while doing on a biology exam. The twenty-one American Association for the Advancement of Science (AAAS) questions on the topics of genetics and evolution were administered to 297 Indonesian and 235 Korean high-school students, in their first and second years. Every question was featured with a question asking students how confident they are in answering the question correctly. The two-way Analysis of Variances (2-way ANOVA) test was used to answer the research questions. Based on the analyses, we found no significance interactional effects of gender and country in test scores. In contrast, we found a significant interactional effects in both confidence in genetics and evolution. Regarding overconfidence bias, for which that we merged both concepts, we found that country had a higher influence on students’ overconfidence bias than did gender. Additionally, we found the hard-easy effect phenomenon followed overconfidence bias phenomenon. The relationships between country, gender, science education, cognitive bias, and overconfidence bias are discussed. Suggestions for reducing overconfidence bias are also provided. 相似文献

4.

Calibration curves,scatterplots and the distinction between general knowledge and perceptual tasks

《Learning and individual differences》1998,10(1):29-50

相似文献

5.

The effects of response format of a structured learning sequence on third grade children's classification achievement

Leonard Popp Ronald Raven 《科学教学研究杂志》1972,9(2):177-184

Classification was selected for use in this investigation because of the central position of process factors in teaching and learning. A twelve section classification program which was based on 12 rules derived from Piaget's analysis of classification was used in the study. The program was produced in both a constructed response (CR) format and in a matching multiple choice (MC) response format. The 36-item classification test was similarly produced in both response modes. Criterion scores on both the CR test and the MC test were collected from each of the 239 grade three subjects following treatment with the CR program, the MC program, or with drawing activities (control). The results of the multivariate and univariate analyses of variance indicated that the program in both response modes enhanced classification achievement although the effects on MC test scores were not consistent across classes, and that each program format enhanced achievement to a greater degree on the test which matched the program response mode. 相似文献

6.

Progress Monitoring in Social Studies Using Vocabulary Matching Curriculum‐Based Measurement

《Learning disabilities research & practice》2017,32(2):112-120

Two hundred and two (n = 202) sixth‐grade students in social studies were administered a weekly vocabulary‐matching curriculum‐based measure (CBM) for 35 weeks. Students were also administered the Scholastic Reading Inventory (SRI), along with the annual state high‐stakes test in Communication Arts. CBM scores were analyzed with respect to alternate form reliability, validity with criterion measures, and student growth over time. Results suggest that the vocabulary‐matching CBM is reliable and valid with the SRI but not with the state test. Students showed an overall linear trend of growth, but this growth was flat in the middle of the semester. Implications for research and practice are discussed. 相似文献

7.

英语专业学生听力与词汇能力的相关性研究

钟毅《重庆第二师范学院学报》2012,25(1):147-150,162

词汇是语言学习的基础,是听力能力提高的关键。本文通过对英语专业学生的词汇与听力测试成绩的相关性进行研究,发现英语专业学生词汇能力与听力能力之间确实存在显著的正相关,本文通过对一例英语专业大二学生的听力成绩分析,对英语专业院校重视词汇在听力教学中的地位予以肯定,同时也对当前听力教学和词汇教学的结合提出几点建议。相似文献

8.

Exploring alternative conceptions from Newtonian dynamics and simple DC circuits: Links between item difficulty and item confidence

Maja Planinic William J. Boone Rudolf Krsnik Meredith L. Beilfuss 《科学教学研究杂志》2006,43(2):150-171

Croatian 1st‐year and 3rd‐year high‐school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear measures: (a) an item‐difficulty measure based upon all responses, (b) an item‐confidence measure based upon correct student answers, and (c) an item‐confidence measure based upon incorrect student answers. Comparisons were made with regard to item difficulty and item confidence. The results suggest that Newtonian dynamics is a topic with stronger students' alternative conceptions than the topic of DC circuits, which is characterized by much lower students' confidence on both correct and incorrect answers. A systematic and significant difference between mean student confidence on Newtonian dynamics and DC circuits items was found in both student groups. Findings suggest some steps for physics instruction in Croatia as well as areas of further research for those in science education interested in additional techniques of exploring alternative conceptions. © 2005 Wiley Periodicals, Inc. J Res Sci Teach 43: 150–171, 2006 相似文献

9.

Confidence versus Performance as an Indicator of the Presence of Alternative Conceptions and Inadequate Problem‐Solving Skills in Mechanics

Marietjie Potgieter Esther Malatje Estelle Gaigher Elsie Venter 《International Journal of Science Education》2013,35(11):1407-1429

This study investigated the use of performance–confidence relationships to signal the presence of alternative conceptions and inadequate problem‐solving skills in mechanics. A group of 33 students entering physics at a South African university participated in the project. The test instrument consisted of 20 items derived from existing standardised tests from literature, each of which was followed by a self‐reported measure of confidence of students in the correctness of their answers. Data collected for this study included students’ responses to multiple‐choice questions and open‐ended explanations for their chosen answers. Fixed response physics and confidence data were logarithmically transformed according to the Rasch model to linear measures of performance and confidence. The free response explanations were carefully analysed for accuracy of conceptual understanding. Comparison of these results with raw score data and transformed measures of performance and confidence allowed a re‐evaluation of the model developed by Hasan, Bagayoko, and Kelley in 1999 for the detection of alternative conceptions in mechanics. Application of this model to raw score data leads to inaccurate conclusions. However, application of the Hasan hypothesis to transformed measures of performance and confidence resulted in the accurate identification of items plagued by alternative conceptions. This approach also holds promise for the differentiation between over‐confidence due to alternative conceptions or due to inadequate problem‐solving skills. It could become a valuable tool for instructional design in mechanics. 相似文献

10.

The influence of a response format test accommodation for college students with and without disabilities

Kyle Potter Lawrence Lewandowski Laura Spenceley 《Assessment & Evaluation in Higher Education》2016,41(7):996-1007

Standardised and other multiple-choice examinations often require the use of an answer sheet with fill-in bubbles (i.e. ‘bubble’ or Scantron sheet). Students with disabilities causing impairments in attention, learning and/or visual-motor skill may have difficulties with multiple-choice examinations that employ such a response style. Such students may request and receive testing accommodations that intend to mitigate these impairments, such as circling responses in a test booklet, which contains both the questions and corresponding multiple-choice answers. The current study evaluated this test accommodation as compared to using a bubble sheet or Scantron on a multiple-choice vocabulary test. College students with (n = 25) and without (n = 76) disabilities completed a vocabulary test under both booklet (accommodated) and bubble sheet (standard) conditions. Results demonstrated that answering in a test booklet, a much preferred response mode, allowed students to attempt significantly more items than using a bubble sheet, improving their overall test scores. Booklet responding tends to improve overall performance, even for students without disabilities, calling into question the specificity and validity of this accommodation. 相似文献

11.

Offshore and onsite placement testing for English pathway programmes

Thomas Roche Michael Harrington 《Journal of Further & Higher Education》2018,42(3):415-428

English language programmes provide established pathways for international students seeking university admission in countries such as Australia and the United Kingdom. In order to refer international applicants to appropriate levels and durations of English language support prior to matriculation into their main course of study, pathway providers need effective and efficient language assessment tools. This report evaluates the effectiveness of an online vocabulary knowledge test as an index of English proficiency for university English pathway programme applicants (N = 177). The Timed Yes/No (TYN) test measures vocabulary recognition size and speed in a time- and resource-effective format. Test results were correlated with performance on a comprehensive placement test consisting of speaking, writing, reading and listening components. The predictive validity of word recognition accuracy (a proxy for size) and response time (a measure of efficiency) for placement test outcomes were examined independently and in combination. The TYN test scores’ sensitivity at predicting comprehensive placement test scores were assessed using a cut-score analysis resulting in an identification accuracy rate ranging from 76 to 86% for five critical band scores. The potential use of the online vocabulary-screening test for measuring international students’ English language proficiency is discussed in terms of reliability, validity, speed, usability and cost-effectiveness in onsite and offshore testing conditions. 相似文献

12.

Use of Adjustment by Minimum Discriminant Information in Linking Constructed‐Response Test Scores in the Absence of Common Items

Yi‐Hsuan Lee Shelby J. Haberman Neil J. Dorans 《Journal of Educational Measurement》2019,56(2):452-472

In many educational tests, both multiple‐choice (MC) and constructed‐response (CR) sections are used to measure different constructs. In many common cases, security concerns lead to the use of form‐specific CR items that cannot be used for equating test scores, along with MC sections that can be linked to previous test forms via common items. In such cases, adjustment by minimum discriminant information may be used to link CR section scores and composite scores based on both MC and CR sections. This approach is an innovative extension that addresses the long‐standing issue of linking CR test scores across test forms in the absence of common items in educational measurement. It is applied to a series of administrations from an international language assessment with MC sections for receptive skills and CR sections for productive skills. To assess the linking results, harmonic regression is applied to examine the effects of the proposed linking method on score stability, among several analyses for evaluation. 相似文献

13.

The successful test taker: exploring test-taking behavior profiles through cluster analysis

Stenlund Tova Lyrén Per-Erik Eklöf Hanna 《European Journal of Psychology of Education - EJPE》2018,33(2):403-417

To be successful in a high-stakes testing situation is desirable for any test taker. It has been found that, beside content knowledge, test-taking behavior, such as risk-taking strategies, motivation, and test anxiety, is important for test performance. The purposes of the present study were to identify and group test takers with similar patterns of test-taking behavior and to explore how these groups differ in terms of background characteristics and test performance in a high-stakes achievement test context. A sample of the Swedish Scholastic Assessment Test test takers (N = 1891) completed a questionnaire measuring their motivation, test anxiety, and risk-taking behavior during the test, as well as background characteristics. A two-step cluster analysis revealed three clusters of test takers with significantly different test-taking behavior profiles: a moderate (n = 741), a calm risk taker (n = 637), and a test anxious risk averse (n = 513) profile. Group difference analyses showed that the calm risk taker profile (i.e., a high degree of risk-taking together with relatively low levels of test anxiety and motivation during the test) was the most successful profile from a test performance perspective, while the test anxious risk averse profile (i.e., a low degree of risk-taking together with high levels of test anxiety and motivation) was the least successful. Informing prospective test takers about these insights can potentially lead to more valid interpretations and inferences based on the test scores.

相似文献

14.

Self-report visual scale of course appeal

D. T. Sobral 《Higher Education》1992,23(3):321-329

A 4-item affect scale portrayed on crosswise lines was developed and tested on medical students participating in preclinical courses with a view to measuring appeal as an educational outcome. This usage was based on assumptions that end-of-course adaptation could be derived from affect responses and should reflect the appeal of a course experience. Indeed, the results demonstrated that positive affect (pleasure, satisfaction) and negative affect (anxiety, grief) responses have substantial correlations with an independent measure of appeal: course valuing section scores of the Course Valuing Inventory. Moreover, students with various adaptation modes, as signalled by affect response patterns, showed significantly different means in course valuing scores. Significant differences were also shown in adaptation mode distribution among students finishing courses with distinct integration methods, or levels of learner control. As hypothesized, it was found that end-of-course adaptation modes differentiate between learners who do and do not volunteer for a student preceptorship in the same course. Findings suggest that affect responses can be used as a scale of course appeal to measure the effects of motivational strategies. 相似文献

15.

A probe into EFL learners’ emotioncy as a source of test bias: Insights from differential item functioning analysis

《Studies in Educational Evaluation》2019

The development of unbiased tests is crucial in the arena of language testing in order to ensure validity. To date, studies of bias in language testing have mainly focused on factors such as gender, native language, or academic background, inter alia. However, bias may also result from psychological factors. Therefore, the present study investigates the role of English as a Foreign Language (EFL) test takers’ emotioncy, defined as the emotions evoked by senses that one holds for an entity, in their test performance. Specifically, this study aimed to examine emotioncy for the form as well as the meaning of 20 words to find out whether it can lead to differential functioning of the items on a vocabulary test. To this end, two emotioncy scales and a vocabulary test were designed. Then, based on the data collected from 235EFL students, the participants were bisected into the Low-Group and the High-Group, once based on their emotioncy scores for each word form and then based on their emotioncy scores for each word meaning. Subsequently, Rasch model-based Differential Item Functioning (DIF) analysis was performed across the two groups. The results showed that the vocabulary test items functioned differentially across the two groups in both form and meaning classifications, favoring the High-Group. Therefore, the study provides evidence for emotioncy as a psychological source of test bias and discusses implications for language testing stakeholders. 相似文献

16.

Measurement Models,Estimation, and the Study of Change

Kevin J. Grimm Anthony P. Kuhl Zhiyong Zhang 《Structural equation modeling》2013,20(3):504-517

The study of change is based on the idea that the score or index at each measurement occasion has the same meaning and metric across time. In tests or scales with multiple items, such as those common in the social sciences, there are multiple ways to create such scores. Some options include using raw or sum scores (i.e., sum of item responses or linear transformation thereof), using Rasch-scaled scores provided by the test developers, fitting item response models to the observed item responses and estimating ability or aptitude, and jointly estimating the item response and growth models. We illustrate that this choice can have an impact on the substantive conclusions drawn from the change analysis using longitudinal data from the Applied Problems subtest of the Woodcock–Johnson Psycho-Educational Battery–Revised collected as part of the National Institute of Child Health and Human Development's Study of Early Child Care. Assumptions of the different measurement models, their benefits and limitations, and recommendations are discussed. 相似文献

17.

英语词汇量测试卷的编制及其信度与效度检验

徐柳明刘振前《外语教学理论与实践》2013,2(1):79-85

本研究依据有关词汇测试的设计原理及模式,编制了词汇量测试卷,先后进行了两轮试测,运用SPSS18.0 ,对试卷项目进行筛选及修订,最终形成含104个题目的词汇量测试卷。信度、效度检验结果显示,试卷内在一致性信度Cronbach系数 ( 0 ． 918) 、重测信度( 0 ． 644 ,p = 0 ． 000) 以及效标区分法效度( t = 6． 358 ,p= 0 ． 000) 、结构效度各level得分之间及总得分之间的相关性系数分别在 ( 0 ．068 ～ 0 ．496和 0 ．294 ～ 0 ．812)均达到测试学要求,本测试卷可作为新课改下非英语专业大学生的词汇量测评的有效工具。相似文献

18.

Sensation seeking and risk-taking propensity as mediators in the relationship between childhood abuse and HIV-related risk behavior

Bornovalova MA Gwadz MA Kahler C Aklin WM Lejuez CW 《Child abuse & neglect》2008,32(1):99-109

OBJECTIVES: Although a wealth of literature suggests that childhood physical, emotional, and sexual abuse are related to later-life HIV-related risk behaviors, few studies have explored disinhibition (e.g., impulsivity, risk-taking propensity, and sensation-seeking) as a risk factor in this relationship. METHOD: This cross-sectional study examined impulsivity, risk-taking propensity, and sensation seeking as mediators in the relationship between abuse history and engagement in HIV-related risk behaviors among a sample of 96 inner-city African American adolescents. RESULTS: Findings indicated that abuse history was positively related to self-reported engagement in HIV-related risk behaviors (B=.027, SE .008, beta=.32, sr(2)=.105, p=.001), as well as risk-taking propensity (B=.35, SE .11, beta=.30, sr(2)=.090, p=.003) and sensation seeking (B=.17, SE .05, beta=.35, sr(2)=.124, p=.0004). Abuse history was not related to impulsivity. Further, while sensation-seeking and risk-taking propensity (to a lesser extent) mediated this relationship, impulsivity did not. CONCLUSIONS: These findings provide an initial step in the examination of the mechanisms underlying the relationship between childhood abuse and engagement in HIV-related risk behaviors. 相似文献

19.

Confidence and gender differences on the Mental Rotations Test

《Learning and individual differences》2007,17(2):181-186

The present study examined the relation between self-reported confidence ratings, performance on the Mental Rotations Test (MRT), and guessing behavior on the MRT. Eighty undergraduate students (40 males, 40 females) completed the MRT while rating their confidence in the accuracy of their answers for each item. As expected, gender differences in favor of men were obtained. Results also indicated a positive correlation between confidence ratings and scores on the MRT, as well as negative correlations between confidence ratings and MRT outcomes presumed to reflect propensity to guess. More elaborate analyses using a measure of accuracy of predictions (the Brier score) indicated that men have a more accurate perception of their performance on the MRT than women do. Findings are discussed in terms of their implications for the interpretation of gender differences and guessing behavior on the MRT. 相似文献

20.

Determining the Overall Impact of Interruptions During Online Testing

Sandip Sinharay Ping Wan Mike Whitaker Dong‐In Kim Litong Zhang Seung W. Choi 《Journal of Educational Measurement》2014,51(4):419-440

With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees’ scores. There is a lack of research on this topic due to the novelty of the problem. This article is an attempt to fill that void. Several methods, primarily based on propensity score matching, linear regression, and item response theory, were suggested to determine the overall impact of the interruptions on the examinees’ scores. A realistic simulation study shows that the suggested methods have satisfactory Type I error rate and power. Then the methods were applied to data from the Indiana Statewide Testing for Educational Progress‐Plus (ISTEP+) test that experienced interruptions in 2013. The results indicate that the interruptions did not have a significant overall impact on the student scores for the ISTEP+ test. 相似文献