首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In his book Mindstorms (1980) Papert discusses Turtle Geometry as ego-syntonic or fitting the ways of thinking of the child as a geometric knowledge builder. The van Hiele theory (Wirzup, 1976) looks at geometric knowledge building as occuring through a necessary sequence of levels. Thus if Turtle Geometry is ego-syntonic it would seem that one could apply these van Hiele levels to better describe and understand this form of geometry. It is the first purpose of this paper to provide such an analysis.Using Logo to do Turtle Geometry requires that a child (or learner at any level) do geometry through the deliberate use of language. Thus one ought also to be able to relate theoretically levels of language use to Turtle Geometry. It is a second purpose of this paper to relate a language use framework suggested by the work of Frye (1982) to the language activities of Turtle Geometry.  相似文献   

2.
The psychometrically sound development of assessment instruments requires pilot testing of candidate items as a first step in gauging their quality, typically a time-consuming and costly effort. Crowdsourcing offers the opportunity for gathering data much more quickly and inexpensively than from most targeted populations. In a simulation of a pilot testing protocol, item parameters for 110 life science questions are estimated from 4,043 crowdsourced adult subjects and then compared with those from 20,937 middle school science students. In terms of item discrimination classification (high vs. low), classical test theory yields an acceptable level of agreement (C-statistic = 0.755); item response theory produces excellent results (C-statistic = 0.848). Item response theory also identifies potential anchor items without including any false positives (items with low discrimination in the targeted population). We conclude that the use of crowdsourcing subjects is a reasonable, efficient method for the identification of high-quality items for field testing and for the selection of anchor items to be used for test equating.  相似文献   

3.
Teacher knowledge is a critical focus of educational research in light of the potential impact of teacher knowledge on student learning. The dearth of research exploring entry-level preservice teachers' geometric knowledge poses an onerous challenge for mathematics educators in initial teacher education (ITE) when designing experiences that develop preservice teachers' geometric knowledge to support the task of teaching. This study examines the geometric thinking levels of entry-level Irish preservice primary teachers (N = 381). Participants' geometric thinking levels were determined through a multiple-choice geometry test (van Hiele Geometry Test) prior to commencing a Geometry course within their ITE program. The findings reveal limited geometric thinking among half of the cohort and question the extent to which pre-tertiary experiences develop appropriate foundations to facilitate a smooth transition into ITE mathematics programs. The study also examines the nature of misconceptions among those with limited geometric thinking and presents suggestions to enhance the geometric understandings of preservice teachers.  相似文献   

4.
Teaching geometry at the elementary level is challenging. This study examines the impact of van Hiele theory-based instructional activities embedded into an elementary mathematics methods course on preservice teachers’ geometry knowledge for teaching. Pre- and post-assessment data from 111 elementary preservice teachers revealed that van Hiele theory-based instruction can be effective in improving three strands of participants’ geometry knowledge for teaching: geometry content knowledge, knowledge of students’ van Hiele levels, and knowledge of geometry instructional activities. As a result, this paper offers implications for teacher educators and policy makers to better prepare elementary preservice teachers with geometry knowledge for teaching.  相似文献   

5.
Pan and Wollack (PW) proposed a machine learning method to detect compromised items. We extend the work of PW to an approach detecting compromised items and examinees with item preknowledge simultaneously and draw on ideas in ensemble learning to relax several limitations in the work of PW. The suggested approach also provides a confidence score, which is based on an autoencoder to represent our confidence that the detection result truly corresponds to item preknowledge. Simulation studies indicate that the proposed approach performs well in the detection of item preknowledge, and the confidence score can provide helpful information for users.  相似文献   

6.
The purpose of this study was to examine the acquisition of the van Hiele levels and motivation of sixth-grade students engaged in instruction using van Hiele theory-based mathematics curricula. There were 150 sixth-grade students, 66 boys and 84 girls, involved in the study. The researcher employed a multiple-choice geometry test to find out students’ reasoning stages and a questionnaire to detect students’ motivation in regards to the instruction. These instruments were administered to the students before and after a five-week period of instruction. The paired-samples t-test, the independent-samples t-test, and ANCOVA with α = .05 were used to analyze the quantitative data. The study demonstrated that there was no statistically significant difference as in motivation between boys and girls, and that no significant difference was detected in the acquisition of the levels between boys and girls. In other words, gender was not a factor in learning geometry.  相似文献   

7.
Although multiple choice examinations are often used to test anatomical knowledge, these often forgo the use of images in favor of text‐based questions and answers. Because anatomy is reliant on visual resources, examinations using images should be used when appropriate. This study was a retrospective analysis of examination items that were text based compared to the same questions when a reference image was included with the question stem. Item difficulty and discrimination were analyzed for 15 multiple choice items given across two different examinations in two sections of an undergraduate anatomy course. Results showed that there were some differences item difficulty but these were not consistent to either text items or items with reference images. Differences in difficulty were mainly attributable to one group of students performing better overall on the examinations. There were no significant differences for item discrimination for any of the analyzed items. This implies that reference images do not significantly alter the item statistics, however this does not indicate if these images were helpful to the students when answering the questions. Care should be taken by question writers to analyze item statistics when making changes to multiple choice questions, including ones that are included for the perceived benefit of the students. Anat Sci Educ 10: 68–78. © 2016 American Association of Anatomists.  相似文献   

8.
The effect of item parameters (discrimination, difficulty, and level of guessing) on the item-fit statistic was investigated using simulated dichotomous data. Nine tests were simulated using 1,000 persons, 50 items, three levels of item discrimination, three levels of item difficulty, and three levels of guessing. The item fit was estimated using two fit statistics: the likelihood ratio statistic (X2B), and the standardized residuals (SRs). All the item parameters were simulated to be normally distributed. Results showed that the levels of item discrimination and guessing affected the item-fit values. As the level of item discrimination or guessing increased, item-fit values increased and more items misfit the model. The level of item difficulty did not affect the item-fit statistic.  相似文献   

9.
Traditional item analyses such as classical test theory (CTT) use exam-taker responses to assessment items to approximate their difficulty and discrimination. The increased adoption by educational institutions of electronic assessment platforms (EAPs) provides new avenues for assessment analytics by capturing detailed logs of an exam-taker's journey through their exam. This paper explores how logs created by EAPs can be employed alongside exam-taker responses and CTT to gain deeper insights into exam items. In particular, we propose an approach for deriving features from exam logs for approximating item difficulty and discrimination based on exam-taker behaviour during an exam. Items for which difficulty and discrimination differ significantly between CTT analysis and our approach are flagged through outlier detection for independent academic review. We demonstrate our approach by analysing de-identified exam logs and responses to assessment items of 463 medical students enrolled in a first-year biomedical sciences course. The analysis shows that the number of times an exam-taker visits an item before selecting a final response is a strong indicator of an item's difficulty and discrimination. Scrutiny by the course instructor of the seven items identified as outliers suggests our log-based analysis can provide insights beyond what is captured by traditional item analyses.

Practitioner notes

What is already known about this topic
  • Traditional item analysis is based on exam-taker responses to the items using mathematical and statistical models from classical test theory (CTT). The difficulty and discrimination indices thus calculated can be used to determine the effectiveness of each item and consequently the reliability of the entire exam.
What this paper adds
  • Data extracted from exam logs can be used to identify exam-taker behaviours which complement classical test theory in approximating the difficulty and discrimination of an item and identifying items that may require instructor review.
Implications for practice and/or policy
  • Identifying the behaviours of successful exam-takers may allow us to develop effective exam-taking strategies and personal recommendations for students.
  • Analysing exam logs may also provide an additional tool for identifying struggling students and items in need of revision.
  相似文献   

10.
《教育实用测度》2013,26(3):257-275
The purpose of this study was to investigate the technical properties of stem-equivalent mathematics items differing only with respect to response format. Using socio- economic factors to define the strata, a proportional stratified random sample of 1,366 Connecticut sixth-grade students were administered one of three forms. Classical item analysis, dimensionality assessment, item response theory goodness-of-fit, and an item bias analysis were conducted. Analysis of variance and confirmatory factor analysis were used to examine the functioning of the items presented in the three different formats. It was found that, after equating forms, the constructed-response formats were somewhat more difficult than the multiple-choice format. However, there was no significant difference across formats with respect to item discrimination. A differential item functioning (DIF) analysis was conducted using both the Mantel-Haenszel procedure and the comparison of the item characteristic curves. The DIF analysis indicated that the presence of bias was not greatly affected by item format; that is, items biased in one format tended to be biased in a similar manner when presented in a different format, and unbiased items tended to remain so regardless of format.  相似文献   

11.
Item response theory scalings were conducted for six tests with mixed item formats. These tests differed in their proportions of constructed response (c.r.) and multiple choice (m.c.) items and in overall difficulty. The scalings included those based on scores for the c.r. items that had maintained the number of levels as the item rubrics, either produced from single ratings or multiple ratings that were averaged and rounded to the nearest integer, as well as scalings for a single form of c.r. items obtained by summing multiple ratings. A one-parameter (IPPC) or two-parameter (2PPC) partial credit model was used for the c.r. items and the one-parameter logistic (IPL) or three-parameter logistic (3PL) model for the m.c. items, ltem fit was substantially worse with the combination IPL/IPPC model than the 3PL/2PPC model due to the former's restrictive assumptions that there would be no guessing on the m.c. items and equal item discrimination across items and item types. The presence of varying item discriminations resulted in the IPL/IPPC model producing estimates of item information that could be spuriously inflated for c.r. items that had three or more score levels. Information for some items with summed ratings were usually overestimated by 300% or more for the IPL/IPPC model. These inflated information values resulted in under-estbnated standard errors of ability estimates. The constraints posed by the restricted model suggests limitations on the testing contexts in which the IPL/IPPC model can be accurately applied.  相似文献   

12.
基于Grabe&Stoller(2005)的阅读能力层次理论以及《高校英语专业四级考试大纲(2004)》对于基础阶段英语专业学生英语阅读能力的考查要求,从能力分类、能力层次评价以及题目难度和区分度与能力层次关系的角度,研究2010年TEM4阅读理解试题中体现的阅读理解能力。结果表明,教师对题目的能力分类和能力层次评价存在明显分歧,但题目难度和区分度与题目能力层次存在正相关。  相似文献   

13.
This study investigated the psychometric characteristics of constructed-response (CR) items referring to choice and non-choice passages administered to students in Grades 3, 5, and 8. The items were scaled using item response theory (IRT) methodology. The results indicated no consistent differences in the difficulty and discrimination of the items referring to the two types of passages. On the average, students' scale scores on the choice and non-choice passages were comparable. Finally, the choice passages differed in terms of overall popularity and in their attractiveness to different gender and ethnic groups  相似文献   

14.
Cross‐level invariance in a multilevel item response model can be investigated by testing whether the within‐level item discriminations are equal to the between‐level item discriminations. Testing the cross‐level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model applications, the cross‐level invariance is assumed without testing of the cross‐level invariance assumption. In this study, the detection methods of differential item discrimination (DID) over levels and the consequences of ignoring DID are illustrated and discussed with the use of multilevel item response models. Simulation results showed that the likelihood ratio test (LRT) performed well in detecting global DID at the test level when some portion of the items exhibited DID. At the item level, the Akaike information criterion (AIC), the sample‐size adjusted Bayesian information criterion (saBIC), LRT, and Wald test showed a satisfactory rejection rate (>.8) when some portion of the items exhibited DID and the items had lower intraclass correlations (or higher DID magnitudes). When DID was ignored, the accuracy of the item discrimination estimates and standard errors was mainly problematic. Implications of the findings and limitations are discussed.  相似文献   

15.
Sixty-eight graduate students made general and specific ratings of the quality of 12 classroom test items, which varied in difficulty and discrimination. Four treatment combinations defined two additional factors: group discussion/no group discussion of test items and exposure/no exposure to an instructional module on test item construction. The students rated the items differentially, depending not only on item difficulty level but also on item discriminative power. The group discussion and exposure to module factors had significant effects on the general item ratings only. Implications of the research were discussed.  相似文献   

16.
Abstract

With the national move toward competency testing, publishers and educators have become increasingly concerned about test validity, item construction, and item readability. While a major effort is usually made by test developers to control the readability level of the test items, there is currently no validated measure of individual item readability.

It is commonly assumed that oral reading of test items by the teacher would ameliorate the readability problem for poor readers. Over 4,000 fifth-grade students were involved in this study aimed at determining the effect of teacher oral reading of test items to good and poor readers. The findings suggested that having teachers read test items aloud during the administration of standardized examinations yielded, overall, higher scores than having students read the items for themselves. However, this intervention did not benefit poor readers more than good readers. Both of these groups reflected similar gains under the influence of this intervention.  相似文献   

17.
Adding representational pictures (RPs) to text-based items has been shown to improve students’ test performance. Focusing on potential explanations for this multimedia effect in testing, we propose two functions of RPs in testing, namely, (1) a cognitive facilitation function and (2) a motivational function. We found empirical support for both functions in this computer-based classroom experiment with N = 410 fifth and sixth graders. All students answered 36 manipulated science items that either contained (text-picture) or did not contain (text-only) an RP that visualized the text information in the item stem. Each student worked on both item types, following a rotated within-subject design. We measured students’ (a) solution success, (b) time on task (TOT), and identified (c) rapid-guessing behavior (RGB). We used generalized and linear mixed-effects models to investigate RPs’ impact on these outcome parameters and considered students’ level of test engagement and item positions as covariates. The results indicate that (1) RPs improved all students’ performance across item positions in a comparable manner (multimedia effect in testing). (2) RPs have the potential to accelerate item processing (cognitive facilitation function). (3) The presence of RPs reduced students’ RGB rates to a meaningful extent (motivational function). Overall, our data indicate that RPs may promote more reliable test scores, supporting a more valid interpretation of students’ achievement levels.  相似文献   

18.
The nature of anatomy education has changed substantially in recent decades, though the traditional multiple‐choice written examination remains the cornerstone of assessing students' knowledge. This study sought to measure the quality of a clinical anatomy multiple‐choice final examination using item response theory (IRT) models. One hundred seventy‐six students took a multiple‐choice clinical anatomy examination. One‐ and two‐parameter IRT models (difficulty and discrimination parameters) were used to assess item quality. The two‐parameter IRT model demonstrated a wide range in item difficulty, with a median of ?1.0 and range from ?2.0 to 0.0 (25th to 75th percentile). Similar results were seen for discrimination (median 0.6; range 0.4–0.8). The test information curve achieved maximum discrimination for an ability level one standard deviation below the average. There were 15 items with standardized loading less than 0.3, which was due to several factors: two items had two correct responses, one was not well constructed, two were too easy, and the others revealed a lack of detailed knowledge by students. The test used in this study was more effective in discriminating students of lower ability than those of higher ability. Overall, the quality of the examination in clinical anatomy was confirmed by the IRT models. Anat Sci Educ 3:17–24, 2010. © 2009 American Association of Anatomists.  相似文献   

19.
The aim of this study is to validate an instrument measuring students’ academic behavioral skills and engagement—skills identified as vital for student achievement. We inspect the reliability and validity of the survey with respect to item fit, factorial structure, relations with academic performance, and the fairness of the items across student groups. The fairness analyses are critical to making valid comparisons between groups and across countries. Data comprising 8520 grade 10 students from four countries were analysed using item response theory. We found that both scales were multidimensional, acted fairly across students’ gender, country, immigrant-, and socio-economic background (after removing four items), and were positively and significantly correlated with self-reported and performance-based academic performance.  相似文献   

20.
This study describes the development and validation of the Homan-Hewitt Readability Formula. This formula estimates the readability level of single-sentence test items. Its initial development is based on the assumption that differences in readability level will affect item difficulty. The validation of the formula is achieved by (a) estimating the readability levels of sets of test items predicted to be written at 2nd- through 8th-grade levels; (b) administering the tests to 782 students in grades 2 through 5; (3) using the class means as the unit of analyses and subjecting the data to a two-factor repeated measures ANOVA. Significant differences were found on class mean performance scores across the levels of readability. These results indicated that a relationship exists between students'reading grade levels and their responses to test items written at higher readability levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号