首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

Educational stakeholders have long known that students might not be fully engaged when taking an achievement test and that such disengagement could undermine the inferences drawn from observed scores. Thanks to the growing prevalence of computer-based tests and the new forms of metadata they produce, researchers have developed and validated procedures for using item response times to identify responses to items that are likely disengaged. In this study, we examine the impact of two techniques to account for test disengagement—(a) removing unengaged test takers from the sample and (b) adjusting test scores to remove rapidly guessed items—on estimates of school contributions to student growth, achievement gaps, and summer learning loss. Our results indicate that removing disengaged examinees from the sample will likely induce bias in the estimates, although as a whole accounting for disengagement had minimal impact on the metrics we examined. Last, we provide guidance for policy makers and evaluators on how to account for disengagement in their own work and consider the promise and limitations of using achievement test metadata for related purposes.  相似文献   

2.
When we administer educational achievement tests, we want to be confident that the resulting scores validly indicate what the test takers know and can do. However, if the test is perceived as low stakes by the test taker, disengaged test taking sometimes occurs, which poses a serious threat to score validity. When computer-based tests are used, disengagement can be detected through occurrences of rapid-guessing behavior. This empirical study investigated the impact of a new effort monitoring feature that can detect rapid guessing, as it occurs, and notify proctors that a test taker has become disengaged. The results showed that, after a proctor notification was triggered, test-taking engagement tended to increase, test performance improved, and test scores exhibited higher convergent validation evidence. The findings of this study provide validation evidence that this innovative testing feature can decrease disengaged test taking.  相似文献   

3.
Abstract

Noncognitive assessments in Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study share certain similarities and provide complementary information, yet their comparability is seldom checked and convergence not sought. We made use of student self-report data of Instrumental Motivation, Enjoyment of Science and Sense of Belonging to School targeted in both surveys in 29 overlapping countries to (1) demonstrate levels of measurement comparability, (2) check convergence of different scaling methods within survey and (3) check convergence of these constructs with student achievement across surveys. We found that the three scales in either survey (except Sense of Belonging to School in PISA) reached at least metric invariance. The scale scores from the multigroup confirmatory factor analysis and the item response theory analysis were highly correlated, pointing to robustness of scaling methods. The correlations between each construct and achievement was generally positive within each culture in each survey, and the correlational pattern was similar across surveys (except for Sense of Belonging), indicating certain convergence in the cross-survey validation. We stress the importance of checking measurement invariance before making comparative inferences, and we discuss implications on the quality and relevance of these constructs in understating learning.  相似文献   

4.
ABSTRACT

The Student Background survey administered along with achievement tests in studies of the International Association for the Evaluation of Educational Achievement includes scales of student motivation, competence, and attitudes toward mathematics and science. The scales consist of positively- and negatively keyed items. The current research examined the factorial structure of the 18-item motivational scales in fourth-grade mathematics in the 2011 Trends in International Mathematics and Science Study (TIMSS). Survey data from six European countries were analyzed. In comparisons of alternative models, the fit was adequate when three correlated factors were specified and negative keying was taken into account as a latent factor, or with correlated uniquenesses among negatively keyed items. Participants reading achievement scores correlated systematically to negative keying with coefficients ranging from .254 to .395 in the six samples. Unlike their higher-scoring peers, fourth-graders with lower reading achievement responded differentially to similar items depending on the direction of item keying, in such a way that their motivation scores were biased downward. Implications about the use of reverse keying in surveys for young students are discussed.  相似文献   

5.
Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the-art machine learning technique, are introduced to identify careless respondents. The performance of the approach was compared with established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent versus careless response behavior was experimentally induced. In the simulation study, gradient boosting machines outperformed traditional detection mechanisms in flagging aberrant responses. However, this advantage did not transfer to the empirical study. In terms of precision, the results of both traditional and the novel detection mechanisms were unsatisfactory, although the latter incorporated response times as additional information. The comparison between the results of the simulation and the online study showed that responses in real-world settings seem to be much more erratic than can be expected from the simulation studies. We critically discuss the generalizability of currently available detection methods and provide an outlook on future research on the detection of aberrant response patterns in survey research.  相似文献   

6.
Two conventional scores and a weighted score on a group test of general intelligence were compared for reliability and predictive validity. One conventional score consisted of the number of correct answers an examinee gave in responding to 69 multiple-choice questions; the other was the formula score obtained by subtracting from the number of correct answers a fraction of the number of wrong answers. A weighted score was obtained by assigning weights to all the response alternatives of all the questions and adding the weights associated with the responses, both correct and incorrect, made by the examinee. The weights were derived from degree-of-correctness judgments of the set of response alternatives to each question. Reliability was estimated using a split-half procedure; predictive validity was estimated from the correlation between test scores and mean school achievement. Both conventional scores were found to be significantly less reliable but significantly more valid than the weighted scores. (The formula scores were neither significantly less reliable nor significantly more valid than number-correct scores.)  相似文献   

7.
This article is about differences between, and the adequacy of, response rates to online and paper‐based course and teaching evaluation surveys. Its aim is to provide practical guidance on these matters. The first part of the article gives an overview of online surveying in general, a review of data relating to survey response rates and practical advice to help boost response rates. The second part of the article discusses when a response rate may be considered large enough for the survey data to provide adequate evidence for accountability and improvement purposes. The article ends with suggestions for improving the effectiveness of evaluation strategy. These suggestions are: to seek to obtain the highest response rates possible to all surveys; to take account of probable effects of survey design and methods on the feedback obtained when interpreting that feedback; and to enhance this action by making use of data derived from multiple methods of gathering feedback.  相似文献   

8.
Formula scoring is a procedure designed to reduce multiple-choice test score irregularities due to guessing. Typically, a formula score is obtained by subtracting a proportion of the number of wrong responses from the number correct. Examinees are instructed to omit items when their answers would be sheer guesses among all choices but otherwise to guess when unsure of an answer. Thus, formula scoring is not intended to discourage guessing when an examinee can rule out one or more of the options within a multiple-choice item. Examinees who, contrary to the instructions, do guess blindly among all choices are not penalized by formula scoring on the average; depending on luck, they may obtain better or worse scores than if they had refrained from this guessing. In contrast, examinees with partial information who refrain from answering tend to obtain lower formula scores than if they had guessed among the remaining choices. (Examinees with misinformation may be exceptions.) Formula scoring is viewed as inappropriate for most classroom testing but may be desirable for speeded tests and for difficult tests with low passing scores. Formula scores do not approximate scores from comparable fill-in-the-blank tests, nor can formula scoring preclude unrealistically high scores for examinees who are very lucky.  相似文献   

9.
Concept mapping is a technique that paves the way to represent knowledge schematically. In this research, concept mapping was used as an assessment method on the impulse–momentum topic. The purpose of this study was to determine teacher candidates’ knowledge about understanding of the concepts of impulse and momentum by comparing and contrasting two different methods; namely, students’ concept maps and an achievement test. The mean of teacher candidates’ concept map scores are extremely low when compared with the scores of the achievement test. In addition, it was seen that although a great number of concepts were written down, not many relationships were established between these concepts. There is a weak correlation between the achievement test and the concept map scores since concept maps assess the students’ knowledge from a conceptual perspective while the achievement tests measure the level of students’ knowledge on the topic and his/her ability to apply this knowledge on different occasions.  相似文献   

10.
Current literature proposes several strategies for improving response rates to student evaluation surveys. Graduate destination surveys pose the difficulty of tracing graduates years later when their contact details may have changed. This article discusses the methodology of one such a survey to maximise response rates. Compiling a sample frame with reliable contact details was most important, but may require using additional sources of information other than university records. In hindsight, graduates should have been contacted prior to the survey to introduce it and stress its importance, while email and postal reminders appeared to have a limited effect on non-respondents. Due to varying response rates between participating universities, online responses were augmented with a call centre administering the survey telephonically to non-respondents. Although overall differences between online and telephonic responses appeared to be small, certain question items may need to be treated with caution when conducting telephonic surveys.  相似文献   

11.
This study compares the effects of two methods of teaching—teacher-centered and cooperative learning—on students’ science achievement and use of social skills. The sample consists of 163 female elementary science students in 8 intact grade 5 classes who were assigned to 2 instructional methods and were taught an identical science unit by 4 classroom teachers. The students’ science achievement was measured by a researcher-designed achievement test given to students as a pretest and a posttest. Students’ social skills were determined by a researcher-designed survey administered as a pretest and posttest. Analysis of the achievement test scores and the social skills survey responses revealed that cooperative learning strategies have significantly (p > 0.05) more positive effects on both students’ achievement and social skills than teacher-centered strategies. These results provide an evidential base to inform policy decisions and encourage and persuade teachers to implement cooperative learning methods in Kuwaiti classrooms.  相似文献   

12.
The initial purpose of this study was to determine how counselors used information yielded by multifactor intelligence tests. Data from questionnaires sent to secondary school counselors in two states, however, revealed enormous percentages of nonclassifiable responses regarding these tests. The proportion of nonclassifiable responses varied from 38 percent on questions concerning where different scores were recorded to 70 percent on questions concerning which IQ scores were most and least predictive of scholastic achievement. Consequently, the study concentrated on the reasons for the large number of unusable responses. The findings seemed to indicate a tendency on the part of counselor educators to downgrade the importance of accurate test interpretation.  相似文献   

13.
In this article, we introduce a person‐fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from ?1.0 to 1.0, with values close to ?1.0 indicating that students respond unexpectedly or differently from the responses expected under a given cognitive model. A simulation study was conducted to evaluate the power of the HCI in detecting different types of misfitting item response vectors. Simulation results revealed that the detection rate of the HCI was a function of type of misfit, item discriminating power, and test length. The best detection rates were achieved when the HCI was applied to tests that consisted of a large number of highly discriminating items. In addition, whether a misfitting item response vector can be correctly identified depends, to a large degree, on the number of misfits of the item response vector relative to the cognitive model. When misfitting response behavior only affects a small number of item responses, the resulting item response vector will not be substantially different from the expectations under the cognitive model and consequently may not be statistically identified as misfitting. As an item response vector deviates further from the model expectations, misfits are more easily identified and consequently higher detection rates of the HCI are expected.  相似文献   

14.
This paper explores parental involvement using principal and parent survey reports to examine whether parents’ involvement in their children’s schools predicts academic achievement. Survey data from principals and parents of seven countries from the PISA 2012 database and hierarchical linear modelling were used to analyse between- and within- school variance in students’ math achievement. Factor analysis of both principal and parent responses revealed three dimensions of parental involvement with schools: parent-initiated involvement, teacher-initiated involvement and parent volunteerism. Principal reports of parent-initiated involvement positively predicted between-school differences in student achievement. Within schools, parent reports of teacher-initiated involvement negatively predicted student achievement. The paper shows the importance of understanding the source of information for survey measures. Information on parental involvement from the parent surveys of the PISA study is suitable for describing within-school variation in student achievement, whereas principal reports can be used to predict variation between schools.  相似文献   

15.
Scores were obtained from 198 ninth grade students on achievement motivation, test anxiety, testwiseness, and risktaking. Tests in mathematics and vocabulary were constructed in free response and multiple choice form, and administered to the subjects in that order, with an interval of 5 weeks between administrations. Partial correlations were computed between scores on the multiple choice tests and achievement motivation, test anxiety, testwiseness, and risktaking, with free response scores partialled out. The partial correlations were corrected for the unreliability in the free response scores, and tested for significance. All partials involving achievement motivation and test anxiety were nonsignificant, as were all partials based on mathematics scores. The partial correlations of vocabulary scores with testwiseness and risktaking were significant without exception. It was concluded that the use of multiple choice tests can favour certain examinees those who are highly testwise and willing to take risks in the test situation. It was noted that the extent to which these examinees were favoured was dependent on the nature of the test, and that a verbal test seemed more susceptible than a numerical test.  相似文献   

16.
Abstract

The effect of changing item responses on scores of elementary school children on a standardized achievement test was studied. Previous research, primarily involving non-standardized instruments and adult samples, indicates that changed responses are more likely to be correct than not. Subjects were 165 third grade students using the Metropolitan Reading Tests. Students received no special instructions regarding changing responses. Changes were identified visually and were independently verified. While frequency of response changes was low, such changes generally improved scores. Sex differences in number and success of changes were non-significant. The relationship between frequency of response change and test score was minimal. Responses to difficult items were changed more frequently with less success than changes on easy items. High scorers made more successful changes than did low scorers. Within the limits of the methodology, results clearly indicated that response changes of elementary students on multiple-choice items tend to improve test scores.  相似文献   

17.
Although many studies have addressed the issue of response quality in survey studies, few have looked specifically at low-quality survey responses in surveys of college students. As students receive more and more survey requests, it is inevitable that some of them will provide low-quality responses to important campus surveys and institutional accountability measures. This study proposes a strategy for uncovering low-quality survey responses and describes how they may affect intercampus accountability measures. The results show that survey response quality does have an effect on intercampus accountability measures, and that certain individual and circumstantial factors may increase the likelihood of low-quality responses. Implications for researchers and higher education administrators are discussed.  相似文献   

18.
ABSTRACT

The identification of rapid guessing is important to promote the validity of achievement test scores, particularly with low-stakes tests. Effective methods for identifying rapid guesses require reliable threshold methods that are also aligned with test taker behavior. Although several common threshold methods are based on rapid guessing response accuracy or visual inspection of response time distributions, this paper describes a new information-based approach to setting thresholds that does not share the limitations of other methods. A pair of information-based methods are introduced, and an empirical comparison study found the new methods to more reliably set thresholds than methods based on response accuracy or visual inspection.  相似文献   

19.
To date, research to date on personal response systems (clickers) has focused on external issues pertaining to the implementation of this technology or broadly measured student learning gains rather than investigating differences in the responses themselves. Multimedia learning makes use of both words and pictures, and research from cognitive psychology suggests that using both words and illustrations improves student learning. This study analyzed student response data from 561 students taking an introductory earth science course to determine whether including an illustration in a clicker question resulted in a higher percentage of correct responses than questions that did not include a corresponding illustration. Questions on topics pertaining to the solid earth were categorized as illustrated questions if they contained a picture, or graph and text-only if the question only contained text. For each type of question, we calculated the percentage of correct responses for each student and compared the results to student ACT-reading, math, and science scores. A within-groups, repeated measures analysis of covariance with instructor as the covariate yielded no significant differences between the percentage of correct responses to either the text-only or the illustrated questions. Similar non-significant differences were obtained when students were grouped into quartiles according to their ACT-reading, -math, and -science scores. These results suggest that the way in which a conceptest question is written does not affect student responses and supports the claim that conceptest questions are a valid formative assessment tool.  相似文献   

20.
Norm-referenced standardized achievement tests are designed, and commonly used, for obtaining group scores. Various methods are used to calculate and express group scores in terms of common derived scores, such as percent ile ranks. Publishers' scaled scores are ordinarily used in these procedures, with the result that the group scores can possess anomalous characteristics. The group scores can vary widely, depending on not only the measure of central tendency but also the type of derived score employed. A reason for this situation is hypothesized to be the use of inappropriate statistical procedures to develop publishers' scaled scores. Practitioners need to be aware of this problem and to document their procedures when calculating and reporting group scores. Test publishers are urged to avoid the use of scaling procedures that are seen as responsible for this problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号