首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The matched pair technique for writing and scoring true-false items was designed to compensate for the acquiescence response set of primary grade children. The claim that this technique increases reliability to an appreciable extent over traditional true-false scoring was investigated by comparing alpha internal consistency coefficients computed for the matched pair true-false, traditional true-false, and three other scoring schemes. Both the total sample coefficients and individual classroom coefficients were computed from the standardization sample of a primary grade economics achievement test (Primary Test of Economic Understanding). Classroom reliability coefficients computed from the matched pair scores were found to be higher than those from scores computed by the other methods. Total sample coefficients obtained from four of the five methods were nearly equal. Evidence of the effects of each scoring technique on concurrent validity is also presented. Contrary to expectations, the correlations of traditional and matched pair scores with Iowa Test of Basic Skills (ITBS) subtests (when adjusted for differing reliabilities) were approximately equal.  相似文献   

2.
Thirty-eight undergraduate students were randomly assigned one of two alternate forms of a 144-item true-false midterm examination. Whenever a statement appeared on one form as true and positively stated, it appeared on the alternate form as false and negatively stated. Similarly, a false and positively stated item on one form was true and negatively stated on the other. The subject matter of the two forms was identical and the four kinds of true-false items were equally represented on each form. Difficulty and discrimination indices were computed for each of the four item types. The statistical results showed negatively stated items were more difficult, but no more discriminating, than positively stated items. Also, false items were not statistically more difficult than true items, but were significantly more discriminating. It was concluded that test constructors should include more false items than true items in their instruments and that all items should be stated positively.  相似文献   

3.
4.
In the standard scoring procedure for multiple‐choice exams, students must choose exactly one response as correct. Often students may be unable to identify the correct response, but can determine that some of the options are incorrect. This partial knowledge is not captured in the standard scoring format. The Coombs elimination procedure is an alternate scoring procedure designed to capture partial knowledge. This paper presents the results of a semester‐long experiment where both scoring procedures were compared on four exams in an undergraduate macroeconomics course. Statistical analysis suggests that the Coombs procedure is a viable alternative to the standard scoring procedure. Implications for classroom instruction and future research are also presented.  相似文献   

5.
《教育实用测度》2013,26(2):187-207
This study compared the criterion-related validity evidence and other psycho- metric characteristics of multiple-choice (MCQ) and multiple true-false (MTF) items in medical specialty certifying examinations in internal medicine and its subspecialties. Results showed that MTF items were more reliable than MCQs and that the format scores were highly correlated. However, MCQs were more highly correlated with an independent performance measure than were MTF items. MTF items were classified primarily as measuring knowledge rather than synthesis or judgment. These results may have implications for examination construction, especially if criterion-related validity evidence is important.  相似文献   

6.
In response to the demand for sound science assessments, this article presents the development of a latent construct called knowledge integration as an effective measure of science inquiry. Knowledge integration assessments ask students to link, distinguish, evaluate, and organize their ideas about complex scientific topics. The article focuses on assessment topics commonly taught in 6th- through 12th-grade classes. Items from both published standardized tests and previous knowledge integration research were examined in 6 subject-area tests. Results from Rasch partial credit analyses revealed that the tests exhibited satisfactory psychometric properties with respect to internal consistency, item fit, weighted likelihood estimates, discrimination, and differential item functioning. Compared with items coded using dichotomous scoring rubrics, those coded with the knowledge integration rubrics yielded significantly higher discrimination indexes. The knowledge integration assessment tasks, analyzed using knowledge integration scoring rubrics, demonstrate strong promise as effective measures of complex science reasoning in varied science domains.  相似文献   

7.
8.
This study analysed the effectiveness of presenting mathematical problems as ‘authentic’, which simulated the main aspects of situations in which students are usually involved. To do so, four independent variables were considered: level of mathematical difficulty (easy or difficult); rewording: standard problems (similar to those presented in textbooks), authentic and containing irrelevant situational information; mathematical ability (measured by means of the BADyG test); and reading comprehension level (measured with the comprehension task from the PROLEC-R test). The dependent measure was the success rate of a sample of 156 primary education children (grades four, five and six) in solving each kind of word problem. The results showed that the authentic versions of difficult problems were solved more successfully than other versions by students with high levels of mathematical aptitude and reading comprehension. That means that authentic wording is useful when children are able to understand the added information and have the mathematical knowledge necessary to interpret it.  相似文献   

9.
Interpreting and creating graphs plays a critical role in scientific practice. The K-12 Next Generation Science Standards call for students to use graphs for scientific modeling, reasoning, and communication. To measure progress on this dimension, we need valid and reliable measures of graph understanding in science. In this research, we designed items to measure graph comprehension, critique, and construction and developed scoring rubrics based on the knowledge integration (KI) framework. We administered the items to over 460 middle school students. We found that the items formed a coherent scale and had good reliability using both item response theory and classical test theory. The KI scoring rubric showed that most students had difficulty linking graphs features to science concepts, especially when asked to critique or construct graphs. In addition, students with limited access to computers as well as those who speak a language other than English at home have less integrated understanding than others. These findings point to the need to increase the integration of graphing into science instruction. The results suggest directions for further research leading to comprehensive assessments of graph understanding.  相似文献   

10.
This article presents a novel experimental methodology in which groups of students were offered the option to choose between two equivalent scoring rules to assess a multiple‐choice test. The effect of choosing the scoring rule on marks is tested. Two major contributions arise from this research. First, it contributes to the literature on the value of choice. Second, it also contributes to the literature on the educational measurement of knowledge. The results suggest that choice could positively affect students' scores. However, students need to learn to choose the assessment method. Moreover, women seem to obtain greater benefits from the option of choosing the scoring rule.  相似文献   

11.
Present research in problem solving appears to be primarily concerned with problem-solving methods and with degree of knowledge acquisition. A brief argument is advanced that this conceptualization is incomplete because of failure to consider individual differences among problem solvers (other than in problem-solving methods and extent of knowledge). A viable theory of problem-solving instruction must take into account all three areas. Evidence for the argument is presented in the form of data on problem-solving success in junior high school students with extreme scores on Witkin's field independence-field dependence measure of cognitive style. Problem-solving protocols are examined as a second source of data. Field independent students significantly out-performed field dependent students on the problems. Examination of protocols revealed consistent performance patterns favoring field independent students.  相似文献   

12.
This study explored the use of machine learning to automatically evaluate the accuracy of students’ written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate students in response to two different evolution instruments (the EGALT-F and EGALT-P) that contained prompts that differed in various surface features (such as species and traits). We tested human-SIDE scoring correspondence under a series of different training and testing conditions, using Kappa inter-rater agreement values of greater than 0.80 as a performance benchmark. In addition, we examined the effects of response length on scoring success; that is, whether SIDE scoring models functioned with comparable success on short and long responses. We found that SIDE performance was most effective when scoring models were built and tested at the individual item level and that performance degraded when suites of items or entire instruments were used to build and test scoring models. Overall, SIDE was found to be a powerful and cost-effective tool for assessing student knowledge and performance in a complex science domain.  相似文献   

13.
Research has found that grades are the most valid instruments for predicting educational success. Why grades have better predictive validity than, for example, standardized tests is not yet fully understood. One possible explanation is that grades reflect not only subject-specific knowledge and skills but also individual differences in other aspects. The purpose was to investigate the relative importance of knowledge and skills and other aspects encapsulated in grades for the predictive validity of compulsory school grades for educational success in upper secondary school. Structural equation modelling was used. Participants were 9th-grade students from 3 birth cohorts, each comprising full populations of approximately 100,000 students. The results showed that the subject-specific factors and an additional common grade factor contributed to the predictive validity. Effects of gender and parents' education were found in the common grade factor, with girls and students with a lower educational background being advantaged.  相似文献   

14.
This study reports an attempt to assess partial knowledge in vocabulary. Fifty multiple-choice vocabulary items were constructed so that the incorrect choices followed the stages of vocabulary acquisition defined by O'Connor (1940). Ability estimates based on Rasch dichotomous and polychotomous models were compared to determine if there were any gains in validity or reliability as a result of using the polychotomous scoring model rather than the dichotomous scoring model. An attempt was also made to determine the appropriateness of O'Connor's stage theory of vocabulary acquisition for predicting the type of errors that examinees of differing ability would make on the test items. The results indicate that the reliability and concurrent validity of the polychotomous scoring of a subset of items that fit the polychotomous scoring model were significantly higher than those for dichotomous scoring of the same subset of items. The results also indicate moderate support for O'Connor's theory of vocabulary acquisition.  相似文献   

15.
The hypothesis that some students, when tested under formula directions, omit items about which they have useful partial knowledge implies that such directions are not as fair as rights directions, especially to those students who are less inclined to guess. This hypothesis may be called the differential effects hypothesis. An alternative hypothesis states that examinees would perform no better than chance expectation on items that they would omit under formula directions but would answer under rights directions. This may be called the invariance hypothesis. Experimental data on this question were obtained by conducting special test administrations of College Board SAT-verbal and Chemistry tests and by including experimental tests in a Graduate Management Admission Test administration. The data provide a basis for evaluating the two hypotheses and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.  相似文献   

16.
To provide an accurate reading of students' and schools' rates of progress, and to provide cues for instruction, assessment at every level should be connected to explicit learning goals and standards. To show how this requirement can be fulfilled, and how research-based assessment can effectively support learning and instruction, this article summarizes a 7-year performance assessment collaboration between assessment researchers and the nation's second largest school district. The project's success in scaling up empirically tested assessment design models and scoring procedures to a district assessment involving more than 300,000 students per year raises the possibility that high-quality learning-centered assessment may again be a practical option for large-scale assessment and accountability.  相似文献   

17.
Academic underpreparedness is an issue for many first-time-in-college students, particularly those entering community colleges. Whereas many underprepared students enroll in developmental education, research has indicated that traditional remediation may not increase students’ chances for success. Therefore, states and colleges have begun to implement new course placement strategies to increase the accuracy of initial course placement and new instructional approaches to better serve their developmental students. Specifically, in 2013, the state of Florida passed Senate Bill 1720 which redesigned developmental coursework and placement policies across the Florida College System. The reform lifted developmental education placement exam testing and course enrollment requirements for certain exempt students, irrespective of prior academic preparation or achievement. The current study focuses on these exempt students—those who had the option to bypass developmental education—who were also underprepared, and their initial course selection and subsequent success in their gateway (introductory college-level) English course. Using statewide student-level data and logistic regression techniques, the results indicated that level of preparation was related to students’ course enrollment and gateway English course success. Students slightly underprepared in reading or writing were more likely than severely underprepared students to enroll in the gateway English class, relative to a developmental reading or writing course. In reading and writing, slightly underprepared students were more likely to pass English, relative to severely underprepared students. The authors consider the findings in light of recent national changes to developmental education and offer recommendations for policy and practice.  相似文献   

18.
The purpose of this study was to compare the reliabilities of true-false (TF) and multiple choice (MC) tests and to determine the concurrent validities of both. Two methods, judgmental and discrimination, were devised for objectively converting MC items to TF form. The TF items generated by the two methods from 70-item MC natural science and social studies tests were incorporated in eight final forms which were differentiated by subject matter, conversion method, and item form order. A sample of 1018 nonurban high school students each responded to one of the eight forms. Examinees tried three TF items for every pair of MC items attempted. The TF tests were significantly less reliable than the MC tests but did tend to measure the same thing as the corresponding MC tests.  相似文献   

19.
The Pass-Fail (P-F) option presents (a) a natural setting for testing some implications of Atkinson’s concept of “fear of failure” (1, 2) and (b) a practical setting for evaluating one aspect of university policy. A sample of students enrolled for P-F credit was compared (on test anxiety, grade utilities under chance and skill conditions, grades, and course loads) with another sample taking course work under an A-F grading system. Only partial confirmation was found for a prediction that test anxiety should be negatively correlated with the difference between grade utilities under chance and under skill conditions. Other findings were (a) that test anxiety and grade option were not significantly correlated, and (b) that the P-F sample had significantly higher grades and heavier course loads.  相似文献   

20.
Tenth grade students studied the topic of ‘The growth curve of microorganisms’, which included a computer‐assisted learning (CAL) simulation episode. The CAL episode enabled students to simulate experiments which investigated the simultaneous impact of three independent variables upon the growth curve of microorganisms (temperature, nutrient concentration and the initial number of individuals from which to start a population growth). Two classes (n = 82 students) formed the experimental group and were instructed in a combination of classroom‐laboratory work with CAL. The control group included three classes (n = 99 students) who were taught the topic in the classroom‐laboratory work setting only. Five teachers taught the five classes, three periods weekly and the study lasted 4 weeks. The students’ previous knowledge in the topic to be learned and their academic achievement were assessed with pre‐ and post‐tests, respectively. The data for each lest were treated with a two‐way analysis of variance. The results showed that the two study groups did not differ in their previous knowledge and no significant differences were found by gender within and between the groups. The post‐test results on academic achievement indicated that students in the experimental group achieved significantly higher means scores than the control group. No significant gender differences on academic achievement were found within each group. The results affirm that: (a) computer simulations, in which three variables are manipulated simultaneously in one experiment, can be integrated as short episodes in the existing biology curricula; (b) high school students can perform computer simulations which require the skills of simultaneous manipulations of three variables in one experiment, problem solving and decision making; (c) girls and boys in the experimental group exhibited these skills at a similar academic level of achievement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号