首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The rise of computer‐based testing has brought with it the capability to measure more aspects of a test event than simply the answers selected or constructed by the test taker. One behavior that has drawn much research interest is the time test takers spend responding to individual multiple‐choice items. In particular, very short response time—termed rapid guessing—has been shown to indicate disengaged test taking, regardless whether it occurs in high‐stakes or low‐stakes testing contexts. This article examines rapid‐guessing behavior—its theoretical conceptualization and underlying assumptions, methods for identifying it, misconceptions regarding its dynamics, and the contextual requirements for its proper interpretation. It is argued that because it does not reflect what a test taker knows and can do, a rapid guess to an item represents a choice by the test taker to momentarily opt out of being measured. As a result, rapid guessing tends to negatively distort scores and thereby diminish validity. Therefore, because rapid guesses do not contribute to measurement, it makes little sense to include them in scoring.  相似文献   

2.
Wilcox (16) proposed a latent structure model for answer-until-correct tests that can solve various measurement problems including correcting for guessing without assuming guessing is at random. This paper proposes a closed sequential procedure for estimating true score that can be used in conjunction with an answer-until-correct test. For criterion-referenced tests where the goal is to determine whether an examinee’s true score is above or below a known constant, the accuracy of the new procedure is exactly the same as a more conventional sequential solution. The advantage of the new procedure is that it eliminates the possibility of using an inordinately large number of items when in fact a large number of items is not needed; typical sequential procedures always allow this possibility. In addition, the new procedure appears to compare favorably to traditional tests where the number of items to be administered is fixed in advance.  相似文献   

3.

A new multiple choice test format is presented that allows examinees to select more than one answer to a question if they are uncertain of the correct one. Negative marking is used to penalise incorrect selections. The aim is to explicitly reward examinees who possess partial knowledge as compared with those who are simply guessing. The result is a test method that forces examinees to think more carefully about their answers, and that yields results of a higher resolution than standard multiple choice tests. After describing the new format, the paper presents and critiques several existing methods which have the same or similar aims. The paper ends with a discussion of the feedback and experience gained to date in using the new format.  相似文献   

4.
Item response time data were used in investigating the differences in student test-taking behavior between two device conditions: computer and tablet. Analyses were conducted to address the questions of whether or not the device condition had a differential impact on rapid guessing and solution behaviors (with response time effort used as an indicator) as well as on the time that students spent on the test (reading, mathematics, and science) or a given item type (such as drag-and-drop and fill in blank). Further analyses were conducted to examine if the potential impact of device conditions varied by gender and ethnicity groups. Overall there were no significant differences in response time effort related to device, although some differences related to item type and test sequence were noted. Students tended to spend slightly more time when taking the tests and certain types of items on the tablet than on the computer. No interactions of device with gender or ethnicity were observed. Follow-up research on the item time thresholds is discussed.  相似文献   

5.
This study examined the utility of response time‐based analyses in understanding the behavior of unmotivated test takers. For the data from an adaptive achievement test, patterns of observed rapid‐guessing behavior and item response accuracy were compared to the behavior expected under several types of models that have been proposed to represent unmotivated test taking behavior. Test taker behavior was found to be inconsistent with these models, with the exception of the effort‐moderated model. Effort‐moderated scoring was found to both yield scores that were more accurate than those found under traditional scoring, and exhibit improved person fit statistics. In addition, an effort‐guided adaptive test was proposed and shown by a simulation study to alleviate item difficulty mistargeting caused by unmotivated test taking.  相似文献   

6.
This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback‐Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson‐Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.  相似文献   

7.
Leakage currents, tiny currents flowing from an everyday-life appliance through the body to the ground, can cause a non-adequate perception (called electrocutaneous sensation, ECS) or even pain and should be avoided. Safety standards for low-frequency range are based on experimental results of current thresholds of electrocutaneous sensations, which however show a wide range between about 50 pA (rms) and 1000 μA (rms). In order to be able to explain these differences, the perception threshold was measured repeatedly in experiments with test persons under identical experimental setup, but by means of different methods (measuring strategies), namely: direct adjustment, classical threshold as amperage of 50% perception probability, and confidence rating procedure of signal detection theory. The current is injected using a I cm2 electrode at the highly touch sensitive part of the index fingertip. These investigations show for the first time that the threshold of electrocutaneous sensations is influenced both by adaptation to the non-adequate stimulus and individual, emotional factors. Theretbre, classical methods, on which the majority of the safety investigations are based, cannot be used to determine a leakage current threshold. The confidence rating procedure of the modem signal detection theory yields a value of 179.5 μA (rms) at 50 Hz power supply net frequency as the lower end of the 95% confidence range considering the variance in the investigated group. This value is expected to be free of adaptation influences, and is distinctly lower than the European limits and supports the stricter regulations of Canada and USA.  相似文献   

8.
Previous research has shown that rapid-guessing behavior can degrade the validity of test scores from low-stakes proficiency tests. This study examined, using hierarchical generalized linear modeling, examinee and item characteristics for predicting rapid-guessing behavior. Several item characteristics were found significant; items with more text or those occurring later in the test were related to increased rapid guessing, while the inclusion of a graphic in a item was related to decreased rapid guessing. The sole significant examinee predictor was SAT total score. Implications of these results for measurement professionals developing low-stakes tests are discussed.  相似文献   

9.
The indices of item difficulty and discrimination, the coefficients of effective length, and the average item information for both single and multiple-answer items using six different scoring formulas were computed and compared. These formulas vary in terms of the assignment of partial credit and the correction for guessing. Results show that items with multiple answers are substantially more discriminating and reliable when partial credit is given. The formulas without correction for guessing seem to perform at least as well as the formulas with correction.  相似文献   

10.
In a longitudinal study, auditory and visual temporal order thresholds (TOTs) were investigated in primary school children (N = 236; mean age at first data point = 6;7) at the beginning of Grade 1 and the end of Grade 2 to test whether rapid temporal processing abilities predict reading and spelling at the end of Grades 1 and 2. Auditory and visual TOTs differed but showed comparable developmental trajectories over 20 months. Visual TOTs were not predictive of literacy measures; auditory TOTs in Grade 1 were the best predictor. Interestingly, they were related to spelling in Grade 2 while auditory TOTs in Grade 2 were not, suggesting that rapid auditory processing abilities have a causal influence on literacy development.  相似文献   

11.
Equivalent forms of a ten-item completion test were constructed. The same test items then were rewritten in matching format and in multiple-choice format, resulting in two forms (A and B) of each of three types of test. All tests were administered to 73 examinees, and parallel-forms reliability coefficients (correlation between scores on A and B) were calculated. These empirically obtained values were compared to the values of the reliability coefficient predicted from theoretically derived equations which indicate the influence of chance success due to guessing on test reliability. In accordance with theory it was found that the completion test was more reliable than the matching test and that the matching test was more reliable than the multiple-choice test. The empirically obtained reliability coefficients were very close to those predicted from the mathematically derived formulas.  相似文献   

12.
Examiners seeking guidance on multiple‐choice and true/false tests are likely to encounter various faulty or questionable ideas. Twelve of these are discussed in detail, having to do mainly with the effects on test reliability of test length, guessing and scoring method (i.e. number‐right scoring or negative marking). Some misunderstandings could be based on evidence from tests that were badly written or administered, while others may have arisen through the misinterpretation of reliability coefficients. The usefulness of item response theory in the analysis of academic test items is briefly dismissed.  相似文献   

13.
Abstract

The authors lamemt the fact that there does not seem to be much agreement as to the proper method of scoring tests The use of the scoring formula is advocated by some and criticized by others. Literature is reviewed showing that the basic assumptions behind the scoring formula (namely that all wrong answers are due to chance guessing) are false. Arguments are presented for and against the continued use of the formula, with the conclusion that its use cannot be justified. A new aspect of this question, that use of the formula may create behavior patterns detrimental to ingenuity and creativity, is also presented.  相似文献   

14.
The DINA (deterministic input, noisy, and gate) model has been widely used in cognitive diagnosis tests and in the process of test development. The outcomes known as slip and guess are included in the DINA model function representing the responses to the items. This study aimed to extend the DINA model by using the random‐effect approach to allow examinees to have different probabilities of slipping and guessing. Two extensions of the DINA model were developed and tested to represent the random components of slipping and guessing. The first model assumed that a random variable can be incorporated in the slipping parameters to allow examinees to have different levels of caution. The second model assumed that the examinees’ ability may increase the probability of a correct response if they have not mastered all of the required attributes of an item. The results of a series of simulations based on Markov chain Monte Carlo methods showed that the model parameters and attribute‐mastery profiles can be recovered relatively accurately from the generating models and that neglect of the random effects produces biases in parameter estimation. Finally, a fraction subtraction test was used as an empirical example to demonstrate the application of the new models.  相似文献   

15.

This article describes how visual methods, particularly photography, can be used in the context of careers education and guidance. It begins by acknowledging that this context is undergoing rapid change given the policy agendas of lifelong learning and social inclusion. However, although these policy agendas continue to emphasize the importance of self-knowledge in managing career development, this represents an area of continuing difficulty in terms of curriculum design and delivery. In recognizing this dilemma the article suggests that visual methods can provide careers educators, guidance practitioners, and their clients with the means to engage ‘self’ in the processes of career learning and planning.  相似文献   

16.
We investigate two non-iterative estimation procedures for Rasch models, the pair-wise estimation procedure (PAIR) and the Eigenvector method (EVM), and identify theoretical issues with EVM for rating scale model (RSM) threshold estimation. We develop a new procedure to resolve these issues—the conditional pairwise adjacent thresholds procedure (CPAT)—and test the methods using a large number of simulated datasets to compare the estimates against known generating parameters. We find support for our hypotheses, in particular that EVM threshold estimates suffer from theoretical issues which lead to biased estimates and that CPAT represents a means of resolving these issues. These findings are both statistically significant (p < .001) and of a large effect size. We conclude that CPAT deserves serious consideration as a conditional, computationally efficient approach to Rasch parameter estimation for the RSM. CPAT has particular potential for use in contexts where computational load may be an issue, such as systems with multiple online algorithms and large test banks with sparse data designs.  相似文献   

17.
18.
ABSTRACT

In the biological sciences, very little is known about the mechanisms by which doctoral students acquire the skills they need to become independent scientists. In the postsecondary biology education literature, identification of specific skills and effective methods for helping students to acquire them are limited to undergraduate education. To establish a foundation from which to investigate the developmental trajectory of biologists’ research skills, it is necessary to identify those skills which are integral to doctoral study and distinct from skills acquired earlier in students’ educational pathways. In this context, the current study engages the framework of threshold concepts to identify candidate skills that are both obstacles and significant opportunities for developing proficiency in conducting research. Such threshold concepts are typically characterised as transformative, integrative, irreversible, and challenging. The results from interviews and focus groups with current and former doctoral students in cellular and molecular biology suggest two such threshold concepts relevant to their subfield: the first is an ability to effectively engage primary research literature from the biological sciences in a way that is critical without dismissing the value of its contributions. The second is the ability to conceptualise appropriate control conditions necessary to design and interpret the results of experiments in an efficient and effective manner for research in the biological sciences as a discipline. Implications for prioritising and sequencing graduate training experiences are discussed on the basis of the identified thresholds.  相似文献   

19.
Abstract

It is not a new idea to use audio‐visual media towards therapeutic ends. This paper, acknowledging Aristotle's theory on the purgative virtues of art, also traces contemporary theorists vis‐a‐vis research conducted between 1926 and 1971. Having done so, the author relates this research, through the use of a case study, to the use of alternative audio‐visual materials as applied in the therapeutic sense. It is concluded that the use of audio‐visual materials and methods can offer great potential to the therapist willing to include them in treating patients.  相似文献   

20.
Abstract

Scoring multipie-choice questions according to the simple scoring systems S1 = R, where R is the number of correct answers, produces an upward bias in scores of poorer students as a result of guessing. The scoring formula conventionally used to adjust for guessing is S2 R-W/(n-1), where W is the number of wrong answers and nis the number of choices per question. However, S2 is based on the unrealistic assumption that on each question the student either knows the correct answer or guesses randomly. On the basis of a more realistic assumption an alternative scoring formula is derived, S4 = [nR + (n-1)Q - Q2/R]/2(n-1), where Q is the number of questions. Compared to S4, the conventional formula (S2) has a downward bias for Q/n < R < Q and the simple formula (S1) has a downward bias for Q/(n-2)<R<Q in addition to its upward bias for R<Q/(n-2).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号