期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Cost–Benefit Analysis of Automatic Item Generation

Audra E. Kosh Mary Ann Simpson Lisa Bickel Mark Kellogg Ellie Sanford‐Moore 《Educational Measurement》2019,38(1):48-53

Automatic item generation (AIG)—a means of leveraging technology to create large quantities of items—requires a minimum number of items to offset the sizable upfront investment (i.e., model development and technology deployment) in order to achieve cost savings. In this cost–benefit analysis, we estimated the cost of each step of AIG and manual item writing and applied cost—benefit formulas to calculate the number of items that would have to be produced before the upfront costs of AIG outweigh manual item writing costs in the context of K‐12 mathematics items. Results indicated that AIG is more cost‐effective than manual item writing when developing, at a minimum, 173 to 247 items within one fine‐grained content area (e.g., fourth‐ through seventh‐grade area of figures). The article concludes with a discussion of implications for test developers and the nonmonetary tradeoffs involved in AIG. 相似文献

2.

Using Spacing to Enhance Diverse Forms of Learning: Review of Recent Research and Implications for Instruction

Shana K. Carpenter Nicholas J. Cepeda Doug Rohrer Sean H. K. Kang Harold Pashler 《Educational Psychology Review》2012,24(3):369-378

Every day, students and instructors are faced with the decision of when to study information. The timing of study, and how it affects memory retention, has been explored for many years in research on human learning. This research has shown that performance on final tests of learning is improved if multiple study sessions are separated—i.e., “spaced” apart—in time rather than massed in immediate succession. In this article, we review research findings of the types of learning that benefit from spaced study, demonstrations of these benefits in educational settings, and recent research on the time intervals during which spaced study should occur in order to maximize memory retention. We conclude with a list of recommendations on how spacing might be incorporated into everyday instruction. 相似文献

3.

THE IMPACT OF ITEM PHRASING ON THE VALIDITY OF ATTITUDE SCALES FOR ELEMENTARY SCHOOL CHILDREN 总被引：3，自引：0，他引：3

JERI BENSON DENNIS HOCEVAR 《Journal of Educational Measurement》1985,22(3):231-240

The purpose of the study was to examine the effect of item phrasing on the validity of a Likert-type attitude scale. Three content similar scales were composed of 15 items, either all positive, all negative, or a mixture of positive and negative items. Five hundred twenty-two students in grades 4–6 responded to one of the three forms. Results from the all positive and negative forms indicated that item means, variances, and factor structures differed significantly. Inspection of item means suggested that it was difficult for the students to indicate agreement by disagreeing with a negative statement. Analyses of the mixed phrasing form indicated factors based upon item phrasing, not item content. Taken together, the results suggest that the technique of balancing item phrasing when used with elementary students appears to affect adversely the validity of attitude measurement. 相似文献

4.

Applying spaced practice in the schools to teach math vocabulary

Shawna Petersen‐Brown Ashlee R. Lundberg Jannine E. Ray Iwalani N. Dela Paz Carrington L. Riss Carlos J. Panahon 《Psychology in the schools》2019,56(6):977-991

Spaced practice, or the distribution of practice opportunities across time, is a well‐known and effective practice for improving retention. However, spaced practice is not effectively implemented in schools, perhapsl as a result of a lack of educationally relevant research in the area. We conducted an educationally relevant investigation of the spaced practice. Using a quasi‐experimental between‐subjects design, we taught 62 third‐ and fourth‐grade students eight math vocabulary words under two patterns of spaced practice (fixed interval and expanded interval) and massed practice. Results showed a benefit of spaced practice over massed practice, but no difference between fixed interval and expanded interval spaced practice. The findings suggest that spaced practice may be implemented to improve the retention of math vocabulary words; however, more research is needed to provide guidelines to support educators in implementing spaced practice in schools. 相似文献

5.

Simultaneous decisions at study: time allocation, ordering, and spacing

Lisa K. Son Nate Kornell 《Metacognition and Learning》2009,4(3):237-248

Learners of all ages face complex decisions about how to study effectively. Here we investigated three such decisions made in concert—time allocation, ordering, and spacing. First, college students were presented with, and made judgments of learning about, 16 word-synonym pairs. Then, when presented with all 16 pairs, they created their own study schedule by choosing when and how long to study each item. The results indicated that (a) the most study time was allocated to difficult items, (b) relatively easy items tended to be studied first, and (c) participants spaced their study at a rate significantly greater than chance. The spacing data, which are of particular interest, differ from previous findings that have suggested that people, including adults, believe massing is more effective than spacing. 相似文献

6.

Comparison of efficiency measures for academic interventions based on acquisition and maintenance

Matthew K. Burns Heather E. Sterling‐Turner 《Psychology in the schools》2010,47(2):126-134

Previous research has demonstrated the importance of examining the instructional efficiency of academic interventions and has defined efficiency as the number of items learned per instructional minute. Maintenance of the skill is also an important instructional goal, however. Therefore, the current study compared efficiency metrics using initial learning and maintenance with 25 fourth‐grade students. Each student was taught the pronunciation and English translation of 12 words from the Esperanto international language with two instructional conditions (six words for each condition). The first condition was traditional drill (TD) rehearsal with all unknown words, and the second was incremental rehearsal (IR) with one unknown and eight known words. Results indicated that, although the IR condition led to significantly more words being retained, TD was significantly more efficient using initial learning. The two conditions were equally efficient, however, when maintenance data were used. Therefore, evaluating the efficiency of instructional interventions should consider maintenance data as well. © 2009 Wiley Periodicals, Inc. 相似文献

7.

Metacognition and the spacing effect: the role of repetition, feedback, and instruction on judgments of learning for massed and spaced rehearsal

Jessica M. Logan Alan D. Castel Sara Haber Emily J. Viehman 《Metacognition and Learning》2012,7(3):175-195

Although memory performance benefits from the spacing of information at encoding, judgments of learning (JOLs) are often not sensitive to the benefits of spacing. The present research examines how practice, feedback, and instruction influence JOLs for spaced and massed items. In Experiment 1, in which JOLs were made after the presentation of each item and participants were given multiple study-test cycles, JOLs were strongly influenced by the repetition of the items, but there was little difference in JOLs for massed versus spaced items. A similar effect was shown in Experiments 2 and 3, in which participants scored their own recall performance and were given feedback, although participants did learn to assign higher JOLs to spaced items with task experience. In Experiment 4, after participants were given direct instruction about the benefits of spacing, they showed a greater difference for JOLs of spaced vs massed items, but their JOLs still underestimated their recall for spaced items. Although spacing effects are very robust and have important implications for memory and education, people often underestimate the benefits of spaced repetition when learning, possibly due to the reliance on processing fluency during study and attending to repetition, and not taking into account the beneficial aspects of study schedule. 相似文献

8.

Multiple audience rating form strategies for student evaluation of college teaching 总被引：1，自引：0，他引：1

Ken Peterson G. Manny Gunne Paul Miller Orlando Rivera 《Research in higher education》1984,20(3):309-321

Michael Scriven has suggested that student rating forms, for the purpose of evaluating college teaching, be designed for multiple audiences (instructor, administrator, student), and with a single global item for summative functions (determination of merit, retention, or promotion). This study reviewed approaches to rating form construction, e.g., factor analytic strategies of Marsh, and recommended the multiple audience design of Scriven. An empirical test of the representativeness of the single global item was reported from an analysis of 1,378 forms collected in a university department of education. The global item correlated most satisfactorily with other items, a computed total of items, items that represented underlying factors, and various triplets of items selected to represent all possible combinations of items. It was concluded that a multiple audience rating form showed distinct advantages in design and that the single global item most fairly and highly represented the overall teaching performance, as judged by students, for decisions about retention, promotion, and merit made by administrators. 相似文献

9.

ESTIMATING THE RELIABILITY OF MULTIPLE TRUE-FALSE TESTS

DAVID A. FRISBIE CYNTHIA A. DRUVA 《Journal of Educational Measurement》1986,23(2):99-105

This study was designed to examine the level of dependence within multiple true-false (MTF) test item clusters by computing sets of item intercorrelations with data from a test composed of both MTF and multiple choice (MC) items. It was posited that internal analysis reliability estimates for MTF tests would be spurious due to elevated MTF within-cluster intercorrelations. Results showed that, on the average, MTF within-cluster dependence was no greater than that found between MTF items from different clusters, between MC items, or between MC and MTF items. But item for item, there was greater dependence between items within the same cluster than between items of different clusters. 相似文献

10.

A Comparison of Item Calibration Procedures in the Presence of Test Speededness

Youngsuk Suh Sun‐Joo Cho James A. Wollack 《Journal of Educational Measurement》2012,49(3):285-311

In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end‐of‐test items (i.e., speeded items). This article conducted a systematic comparison of five‐item calibration procedures—a two‐parameter logistic (2PL) model, a one‐dimensional mixture model, a two‐step strategy (a combination of the one‐dimensional mixture and the 2PL), a two‐dimensional mixture model, and a hybrid model‐–by examining how sample size, percentage of speeded examinees, percentage of missing responses, and way of scoring missing responses (incorrect vs. omitted) affect the item parameter estimation in speeded tests. For nonspeeded items, all five procedures showed similar results in recovering item parameters. For speeded items, the one‐dimensional mixture model, the two‐step strategy, and the two‐dimensional mixture model provided largely similar results and performed better than the 2PL model and the hybrid model in calibrating slope parameters. However, those three procedures performed similarly to the hybrid model in estimating intercept parameters. As expected, the 2PL model did not appear to be as accurate as the other models in recovering item parameters, especially when there were large numbers of examinees showing speededness and a high percentage of missing responses with incorrect scoring. Real data analysis further described the similarities and differences between the five procedures. 相似文献

11.

Parameter Estimation in Rasch Models for Examinee‐Selected Items

下载免费PDF全文

Chen‐Wei Liu Wen‐Chung Wang 《Journal of Educational Measurement》2017,54(4):518-549

The examinee‐selected‐item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using standard item response theory models, which assume ignorable missing data, can yield biased parameter estimates so that examinees taking different sets of items to answer cannot be compared. To solve this fundamental problem, in this study the researchers utilized the specific objectivity of Rasch models by adopting the conditional maximum likelihood estimation (CMLE) and pairwise estimation (PE) methods to analyze ESI data, and conducted a series of simulations to demonstrate the advantages of the CMLE and PE methods over traditional estimation methods in recovering item parameters in ESI data. An empirical data set obtained from an experiment on the ESI design was analyzed to illustrate the implications and applications of the proposed approach to ESI data. 相似文献

12.

A CROSS-VALIDATION STUDY OF THE ITEM ORDERING OF THE PEABODY PICTURE VOCABULARY TEST1

JOSEPH S. RENZULLI DIETER H. PAULUS 《Journal of Educational Measurement》1969,6(1):15-20

A subset of the items of both forms of the Peabody Picture Vocabulary Test (PPVT) was administered to a sample of 452 fourth-, fifth- and sixth-grade students. This sample of students was randomly divided into two equal subgroups. Item difficulty indices were calculated for each of the two subsamples for each of the two forms of the test. Data obtained from the first subsample were used to evaluate the published ordering of items of Forms A and B of the PPVT and to reorder the items according to the empirically derived item difficulties. The second subsample was used as a cross-validation sample to evaluate the empirically derived reordering of items. The results of the cross-validation of the reordering indicate a substantial and significant increase in the validity of the item orderings for this subset of items on both forms of the PPVT. Therefore, this new ordering may yield a more accurate estimate of the intelligence of average and above students in the fourth-, fifth-, and sixth-grades than the present, published ordering of items. 相似文献

13.

An Investigation of Different Treatment Strategies for Item Category Collapsing in Calibration: An Empirical Study

Brenda Siok-Hoon Tay-lim Jinming Zhang 《教育实用测度》2015,28(2):143-155

To ensure the statistical result validity, model-data fit must be evaluated for each item. In practice, certain actions or treatments are needed for misfit items. If all misfit items are treated, much item information would be lost during calibration. On the other hand, if only severely misfit items are treated, the inclusion of misfit items may invalidate the statistical inferences based on the estimated item response models. Hence, given response data, one has to find a balance between treating too few and too many misfit items. In this article, misfit items are classified into three categories based on the extent of misfit. Accordingly, three different item treatment strategies are proposed in determining which categories of misfit items should be treated. The impact of using different strategies is investigated. The results show that the test information functions obtained under different strategies can be substantially different in some ability ranges. 相似文献

14.

Spacing extinction trials alleviates renewal and spontaneous recovery

Gonzalo P. Urcelay Daniel S. Wheeler Ralph R. Miller 《Learning & behavior》2009,37(1):60-73

Studies of extinction in classical conditioning situations can reveal techniques that maximize the effectiveness of exposure-based behavior therapies. In three experiments, we investigated the effect of varying the intertrial interval during an extinction treatment in a fear-conditioning preparation with rats as subjects. In Experiment 1, we found less fear at test (i.e., more effective extinction) when extinction trials were widely spaced, relative to intermediate or massed extinction trials. In Experiment 2, we used an ABA renewal procedure and observed that spaced trials attenuated renewal of conditioned fear relative to massed trials. In Experiment 3, we used a similar design, but instead of changing the physical context at the time of testing, we interposed a retention interval after the extinction treatment to produce a change in the temporal context. The results showed less spontaneous recovery of fear after spaced than after massed extinction trials. These results suggest that extinction is more enduring when the extinction trials are spaced rather than massed. Although the benefits of spacing trials are small when there is no contextual change from extinction to testing, a change in either physical or temporal context following massed extinction trials leads to a recovery from extinction, which is reduced when the trials are spaced. 相似文献

15.

Unidimensional Interpretations for Multidimensional Test Items

Nilufer Kahraman 《Journal of Educational Measurement》2013,50(2):227-246

This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item‐level multidimensionality, and (2) whether a Projection IRT model can provide a useful remedy. A real‐data example is used to illustrate the problem and also is used as a base model for a simulation study. The results suggest that ignoring item‐level multidimensionality might lead to inflated item discrimination parameter estimates when the proportion of multidimensional test items to unidimensional test items is as low as 1:5. The Projection IRT model appears to be a useful tool for updating unidimensional item parameter estimates of multidimensional test items for a purified unidimensional interpretation. 相似文献

16.

Validity of the Simultaneous Approach to the Development of Equivalent Achievement Tests in English and French

W. Todd Rogers Jie Lin Christia M. Rinaldi 《教育实用测度》2013,26(1):39-70

The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g., English) and checking to see that both language versions of the item mean the same. The evidence collected through the item development stage suggested that the simultaneous test development method allowed the influence and integration of information from item writers representing different language and cultural groups to affect test development directly. Certified English/French translators and interpreters and the French Immersion students confirmed that the test items in French and English had comparable meanings. The pairs of test forms had equal standard errors of measurement. The source of differential item functioning was not attributable to the adaptation process used to produce the two language forms, but to the lack of French language proficiency as well as other unknown sources. Lastly, the simultaneous approach used in the present study was somewhat more efficient than the forward translation procedure currently in use. 相似文献

17.

THE EFFECTS OF TEACHING PROBLEM SOLVING ON ACADEMIC PERFORMANCE AND RETENTION

Marilyn B. Pugh 《Community College Journal of Research & Practice》2013,37(3):339-349

A study of the effects of explicitly teaching a problem‐solving strategy on problem‐solving ability, course average, course success, and student retention is reported. Two classes of microeconomics principles were involved in a quasi‐experiment. The experimental class was explicitly taught the problem‐solving strategy and this strategy was then used to solve microeconomic problems in class. The control class was assigned, solved, and discussed the same problems without being taught the problem‐solving strategy. Multiple regression and analysis of variance show that while teaching problem solving did not significantly affect course average, student success in passing the course or problem solving ability, it did result in significantly higher student retention. Results indicate that teaching problem solving only affects those students with low problem solving abilities who would have dropped out of class, and that teaching this strategy helps them remain in the class and succeed. 相似文献

18.

The Role of Extended Time and Item Content on a High‐Stakes Mathematics Test

Allan S. Cohen Noel Gregg Meng Deng 《Learning disabilities research & practice》2005,20(4):225-233

The premise of a great deal of current research guiding policy development has been that accommodations are the catalyst for student performance differences. Rather than accepting this premise, two studies were conducted to investigate the influence of extended time and content knowledge on the performance of ninth‐grade students who took a statewide mathematics test with and without accommodations. Each study involved 1,250 accommodated students (extended time only) with learning disabilities and 1,250 nonaccommodated students demonstrating no disabilities. In Study One, a standard differential item functioning (DIF) analysis illustrated that the usual approach to studying the effects of accommodations contributes little to our understanding of the reason for performance differences across students. Next, a mixture item response theory DIF model was used to explore the most likely cause(s) for performance differences across the population. The results from both studies suggest that students for whom items were functioning differently were not accurately characterized by their accommodation status but rather by their content knowledge. That is, knowing students' accommodation status (i.e., accommodated or nonaccommodated) contributed little to understanding why accommodated and nonaccommodated students differed in their test performance. Rather, the data would suggest that a more likely explanation is that mathematics competency differentiated the groups of student learners regardless of their accommodation and/or reading levels. 相似文献

19.

Different administrative directions and student ratings of instruction: Cognitive versus affective effects

Pasen Robert M. Frey Peter W. Menges Robert J. Rath Gustave J. 《Research in higher education》1978,9(2):161-167

A manipulation of the instructions students received prior to completing the 7-item Endeavor Instructional Rating card differentially affected their ratings on two types of items. Specifically, when students were led to believe their ratings would have a strong impact on the instructor's career, they tended to be more lenient on items measuring rapport (i.e., the affective domain); this same effect was not observed for items measuring pedagogical skill (i.e., the cognitive domain). The different items on our instructional rating instrument appear to be measuring different things. One implication of this observation is that the inconsistent findings reported in past research on student ratings of instruction may be due to the differential mix of items from one instrument to another. When instructors are compared on ratings given them by students, unbiased interpretation requires that the multidimensional nature of teaching (and of the rating instrument) be considered. 相似文献

20.

A Method for Maintaining Scale Stability in the Presence of Test Speededness

James A. Wollack Allan S. Cohen Craig S. Wells 《Journal of Educational Measurement》2003,40(4):307-330

Administering tests under time constraints may result in poorly estimated item parameters, particularly for items at the end of the test (Douglas, Kim, Habing, & Gao, 1998; Oshima, 1994). Bolt, Cohen, and Wollack (2002) developed an item response theory mixture model to identify a latent group of examinees for whom a test is overly speeded, and found that item parameter estimates for end-of-test items in the nonspeeded group were similar to estimates for those same items when administered earlier in the test. In this study, we used the Bolt et al. (2002) method to study the effect of removing speeded examinees on the stability of a score scale over an II-year period. Results indicated that using only the nonspeeded examinees for equating and estimating item parameters provided a more unidimensional scale, smaller effects of item parameter drift (including fewer drifting items), and less scale drift (i.e., bias) and variability (i.e., root mean squared errors) when compared to the total group of examinees. 相似文献