首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 544 毫秒
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice effect of examinee‐selected items. The results of a series of simulation studies showed: (1) that the parameters of the new models were recovered well, (2) the parameter estimates were almost unbiased when the new models were fit to data that were simulated from standard item response models, (3) failing to consider the choice effect yielded shrunken parameter estimates for examinee‐selected items, and (4) even when the missingness mechanism in examinee‐selected items did not follow the item response functions specified in the new models, the new models still yielded a better fit than did standard item response models. An empirical example of a college entrance examination supported the use of the new models: in general, the higher the examinee's ability, the better his or her choice of items.  相似文献   

When an exam consists, in whole or in part, of constructed-response items, it is a common practice to allow the examinee to choose a subset of the questions to answer. This procedure is usually adopted so that the limited number of items that can be completed in the allotted time does not unfairly affect the examinee. This results in the de facto administration of several different test forms, where the exact structure of any particular form is determined by the examinee. However, when different forms are administered, a canon of good testing practice requires that those forms be equated to adjust for differences in their difficulty. When the items are chosen by the examinee, traditional equating procedures do not strictly apply due to the nonignorable nature of the missing responses. In this article, we examine the comparability of scores on such tests within an IRT framework. We illustrate the approach with data from the College Board's Advanced Placement Test in Chemistry  相似文献   

For many years, question choice has been used in some UK public examinations, with students free to choose which questions they answer from a selection (within certain parameters). There has been little published research on choice of exam questions in recent years in the UK. In this article we distinguish different scenarios in which choice arises, explore the arguments for and against using optional questions, and exploit the item level data that has recently become available from on-screen marking of examinations to exemplify methods for investigating the (statistical) comparability of optional questions. We conclude that unless there is a very good reason for allowing question choice it should be avoided.  相似文献   

In order to obtain objective measurement for examinations that are graded by judges, an extension of the Rasch model designed to analyze examinations with more than two facets (items/examinees) is used. This extended Rasch model calibrates the elements of each facet of the examination (i.e., examinee performances, items, and judges) on a common log-linear scale. A network for assigning judges to examinations is used to link all facets. Real examination data from the "clinical assessment" part of a certification examination are used to illustrate the application. A range of item difficulties and judge severities were found. Comparison of examinee raw scores with objective linear measures corrected for variations in judge severity shows that judge severity can have a substantial impact on a raw score. Correcting for judge severity improves the fairness of examinee measures and of the subsequent pass-fail decisions because the uncorrected raw scores favor examinee performances graded by lenient judges.  相似文献   

Does reviewing previous answers during multiple-choice exams help examinees increase their final score? This article formalizes the question using a rigorous causal framework, the potential outcomes framework. Viewing examinees’ reviewing status as a treatment and their final score as an outcome, the article first explains the challenges of identifying the causal effect of answer reviewing in regular exam-taking settings. In addition to the incapability of randomizing the treatment selection (reviewing status) and the lack of other information to make this selection process ignorable, the treatment variable itself is not fully known to researchers. Looking at examinees’ answer sheet data, it is unclear whether an examinee who did not change his or her answer on a specific item reviewed it but retained the initial answer (treatment condition) or chose not to review it (control condition). Despite such challenges, however, the article develops partial identification strategies and shows that the sign of the answer reviewing effect can be reasonably inferred. By analyzing a statewide math assessment data set, the article finds that reviewing initial answers is generally beneficial for examinees.  相似文献   

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second question, three subscore estimation methods (i.e., subscore estimated from the observed subscore, total score, or a combination of both the subscore and total score) were compared. Analyses were conducted using data from six licensure tests. Results indicated that reporting subscores at the examinee level may not be necessary as they did not provide much additional information over what is provided by the total score. However, at the institutional level (for institution size ≥ 30), reporting subscores may not be harmful, although they may be redundant because the subscores were predicted equally well by the observed subscores or total scores. Finally, results indicated that estimating the subscore using a combination of observed subscore and total score resulted in the highest reliability.  相似文献   

Problem-solving strategy is frequently cited as mediating the effects of response format (multiple-choice, constructed response) on item difficulty, yet there are few direct investigations of examinee solution procedures. Fifty-five high school students solved parallel constructed response and multiple-choice items that differed only in the presence of response options. Student performance was videotaped to assess solution strategies. Strategies were categorized as "traditional"–those associated with constructed response problem solving (e.g., writing and solving algebraic equations)–or "nontraditional"–those associated with multiple-choice problem solving (e.g., estimating a potential solution). Surprisingly, participants sometimes adopted nontraditional strategies to solve constructed response items. Furthermore, differences in difficulty between response formats did not correspond to differences in strategy choice: some items showed a format effect on strategy but no effect on difficulty; other items showed the reverse. We interpret these results in light of the relative comprehension challenges posed by the two groups of items.  相似文献   

Since 1971 there have been a number of studies in which a cut score has been set using a method proposed by Angoff (1971). In this method, each member of a panel of judges estimates for each test question the proportion correct for a specific target group of examinees. Prior and contemporary research suggests that this is a difficult task for judges. Angoff also proposed that judges simply indicate whether or not an examinee from the target group will be able to answer each question correctly (the yes/no method). We report on the results of two studies that compare a yes/no estimation with a proportion correct estimation. The two studies demonstrate that both methods produce essentially equal cut scores and that judges find the yes/no method more comfortable to use than the estimated proportion correct method.  相似文献   

教育公平问题是一个与我国教育发展进程相伴的问题,是世界近现代教育民主进程中一个普遍的重要原则。随着经济发展和社会进步,高等教育公平成为社会关注的热点,在我国高等教育迈向大众化阶段的特定时期,教育公平问题不断凸显,因此它越来越成为社会关注的话题。许多学者对高等教育公平问题进行了研究,主要内容包括:关于我国高等教育公平的概念研究,关于我国高等教育公平的实践问题研究等。  相似文献   

In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information‐retrieval‐based automatic question‐answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random forests algorithm, these data then are used to train a prediction model for item response times and predicted response times then are used to assemble test forms. Using empirical data from the United States Medical Licensing Examination, we show that timing demands are more consistent across these specially assembled forms than across forms comprising randomly‐selected items. Because an exam's timing conditions affect examinee performance, this result has implications for exam fairness whenever examinees are compared with each other or against a common standard.  相似文献   

《Assessing Writing》1998,5(1):39-70
The Maryland School Performance Assessment Program (MSPAP) tests include an expressive writing task in which students at grades 3, 5, and 8 can choose to write about any topic they wish in the form of either a story, poem, or play. This test design feature provided the opportunity to investigate what factors contribute to students' choice of genre, how scorers apply a single expressive writing rubric to a range of genres, and whether these genres constitute equivalent tasks for measurement and reporting purposes. Our study, which combined analysis of statewide score data, 300 randomly selected student texts, questionnaires given to teacher-scorers, and interviews with students, argues strongly for the validity of this choice task as a measure of expressive writing and demonstrates that choice of genre both increases writers' engagement and enhances the fairness of the assessment by giving all students the best opportunity to demonstrate proficiency in this learning outcome. By highlighting several features of student texts that complicate scoring, the study also suggests that accuracy and consistency might be improved by
  • 1.1) providing additional sample papers during training,
  • 2.2) attending to scorers' assumptions regarding several key concepts, especially “originality,” and
  • 3.3) adjusting the ways that training for focused holistic scoring generally takes place.
The study concludes that the perceptions of students, scorers, and classroom teachers are critical to the ongoing development of writing assessments that offer students increasing control and choice.  相似文献   

A statistical test for the detection of answer copying on multiple-choice tests is presented. The test is based on the idea that the answers of examinees to test items may be the result of three possible processes: (1) knowing, (2) guessing, and (3) copying, but that examinees who do not have access to the answers of other examinees can arrive at their answers only through the first two processes. This assumption leads to a distribution for the number of matched incorrect alternatives between the examinee suspected of copying and the examinee believed to be the source that belongs to a family of "shifted binomials." Power functions for the tests for several sets of parameter values are analyzed. An extension of the test to include matched numbers of correct alternatives would lead to improper statistical hypotheses.  相似文献   

价值多元化是现代化发展到20世纪时的突出现象.西方学者对此进行了深入探讨.新自由主义学者认为,价值多元化必然导致价值多元主义和共识的不可能性,表现为主流价值观的缺位及社会的不和谐状态.然而,实质上,价值多元在给社会和谐带来挑战的同时,共识的存在及和谐的实现仍然是不容否定的,这是由各种客观因素所决定的.只要在经济、政治等方面进行努力,特别是在文化上,在坚持以马克思主义为指导的主流价值观的基础上,实现个人与社会、“一“与“多“、主流与非主流价值观的有机统一,那么,在价值多元化这一新的历史层面上达成广泛共识、进而实现新型的更高层次的社会和谐不仅是可能的,而且是必然的.  相似文献   

School choice is a controversial topic in the education debate. Proponents argue that choice would open up opportunities to disadvantaged families. Critics counter that choice may exacerbate inequities as advantaged parents are more likely to choose the best schools. Rio de Janeiro and Santiago provide unique institutional contexts in which to explore how choice may affect equity. We use datasets with information on home addresses to compare the choices of parents with different backgrounds. We find that disadvantaged parents in both cities are less likely to choose high achieving schools. The differences are more pronounced in Santiago than in Rio. These results suggest that choice policies will likely not reduce inequities and the design of the program influences behavior.  相似文献   

Students studied instructional materials under two choice conditions. In one case, students were free to choose the topic of study from six alternatives; in the other, the topic was assigned randomly. In addition, some of the students received immediate tests on the materials while the others took placebo tests. When free to choose the topic, students had higher affect for the material, showed greater willingness to continue work on the topic later, and spent more time studying the materials. While the presence of an immediate test increased delayed retention, freedom to choose the topic did not. A measure of students’ perceptions of the amount of freedom they felt in the choice and no-choice situations suggested that they felt relatively but not absolutely freer when able to choose the topic. Apparently, the relative increase in feeling of freedom was sufficient to influence affective but not cognitive outcomes.  相似文献   

This study evaluates the impact of an independent postmidterm question analysis exercise on the ability of students to answer subsequent exam questions on the same topics. It was conducted in three sections (~400 students/section) of introductory biology. Graded midterms were returned electronically, and each student was assigned a subset of questions answered incorrectly by more than 40% of the class to analyze as homework. The majority of questions were at Bloom's application/analysis level; this exercise therefore emphasized learning at these higher levels of cognition. Students in each section answered final exam questions matched by topic to all homework questions, providing a within-class control group for each question. The percentage of students who correctly answered the matched final exam question was significantly higher (p < 0.05) in the Topic Analysis versus Control Analysis group for seven of 19 questions. We identified two factors that influenced activity effectiveness: 1) similarity in topic emphasis of the midterm-final exam question pair and 2) quality of the completed analysis homework. Our data suggest that this easy-to-implement exercise will be useful in large-enrollment classes to help students develop self-regulated learning skills. Additional strategies to help introductory students gain a broader understanding of topic areas are discussed.  相似文献   

This study assessed the ability of history students to choose the essay topic on which they can get the highest score. A second, equally important question was whether the score on the chosen topic was more highly related to other indicators of proficiency in history than the score on the unchosen topic. Overall, for both U.S. and European history, scores were about one third of a standard deviation higher for the preferred topic than for the other topic. For U.S. history, about 32% of the students made the wrong choice; that is, 32% got a higher score on the other topic than on the preferred topic. In European history, 29% made the wrong choice. In the U.S. history sample, the preferred essay correlated .40 with an external criterion score, compared to .34 for the other essay; in the European history sample, the preferred essay correlated .52 with the external criterion, compared to .44 for the other topic.  相似文献   

绿蓝悖论是信念接受理论中出现的一种悖谬状态。作为信念的科学假说是否被接受及其接受标准问题是科学哲学中的核心问题。以波普尔为代表的证伪主义者对科学假说的接受及优选提供了一个可证伪性标准。面对绿蓝悖论的挑战,证伪主义者从定性和定量两方面作了应答。波普尔用“子类关系”来比较相竞争假说的潜在证伪者类策略无法在绿假说和绿蓝假说中理性地作出选择,而“维”的比较策略受损于其对表达系统的相对性也无法作出选择。把波普尔的验证度量化理论运用到解决绿蓝悖论时,波普尔的量化理论也不能为绿蓝悖论的解决提供一个充分性标准。  相似文献   

One of the controversial points in the debate on schools choice is the problem of educational justice. Does the introduction of choice schemes decrease or increase social inequality within the education system? The objective of this contribution, which leads on from current debates in Anglo-Saxon moral and educational philosophy, is not to answer this question, but to analyze the apparent dissent. The role of normative principles and empirical assumptions on the situation will be investigated.  相似文献   

We investigated the effect of two visual aids in representational illustrations on pupils’ realistic word problem solving. In part 1 of our study, 288 elementary school pupils received an individual paper-and-pencil task with seven problematic items (P-items) in which realistic considerations need to be made to come to an appropriate reaction. These items were presented together either with representational illustrations, representational illustrations in which an element was added to make the realistic modelling complexity more apparent, or representational illustrations in which this element was cued. In part 2, the pupils received the same P-items together with a realistic and a non-realistic answer option, with the request to choose the best answer. The findings show that there was no positive effect of the visual aids on the number of realistic reactions in part 1 and that when reviewing possible answers to P-items in part 2, there again was no positive effect.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号