首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article focuses on the practical use of Bloom's Taxonomy of Educational Objectives. The current status of analyzing and classifying test items and behavioral objectives was examined in this study. Specifically, the purpose of this study was to analyze and classify the ISIS minicourse performance objectives and criterion-referenced test items according to Bloom's cognitive Taxonomy in order to determine what levels of cognition the ISIS instructional materials are directed. The performance objectives and test items of thirty-three ISIS minicourses and criterion-referenced tests were collected and classified. Four research questions were posed in the study. The findings indicate that ISIS minicourse test items and performance objectives are written primarily at the Knowledge and Comprehension levels. The ISIS instructional materials reflect low percentages of upper cognitive level test items and performance objectives. Based upon the use of a chi-square analysis, twenty-four of the ISIS minicourses and tests demonstrate a positive congruence between their performance objectives and criterion-referenced test items. Nine ISIS minicourses were found to demonstrate a negative relationship between their performance objectives and test items. Implications and Recommendations based on the findings of the studies are provided.  相似文献   

2.
《教育实用测度》2013,26(1):47-64
Optimal appropriateness measurement statistically provides the most powerful methods for identifying individuals who are mismeasured by a standardized psychological test or scale. These methods use a likelihood ratio test to compare the hypothesis of normal responding versus the alternative hypothesis that an individual's responses are aberrant in some specified way. According to the Neyman-Pearson Lemma, no other statistic computed from an individual's item responses can achieve a higher rate of detection of the hypothesized measure- ment anomaly at the same false positive rate. Use of optimal methods requires a psychometric model for normal responding, which can be readily obtained from the item response theory literature, and a model for aberrant responding. In this article, several concerns about measurement anomalies are described and transformed into quantitative models. We then show how to compute the likeli- hood of a response pattern u* for each of the aberrance models.  相似文献   

3.
In this ITEMS module, we frame the topic of scale reliability within a confirmatory factor analysis and structural equation modeling (SEM) context and address some of the limitations of Cronbach's α. This modeling approach has two major advantages: (1) it allows researchers to make explicit the relation between their items and the latent variables representing the constructs those items intend to measure, and (2) it facilitates a more principled and formal practice of scale reliability evaluation. Specifically, we begin the module by discussing key conceptual and statistical foundations of the classical test theory model and then framing it within an SEM context; we do so first with a single item and then expand this approach to a multi‐item scale. This allows us to set the stage for presenting different measurement structures that might underlie a scale and, more importantly, for assessing and comparing those structures formally within the SEM context. We then make explicit the connection between measurement model parameters and different measures of reliability, emphasizing the challenges and benefits of key measures while ultimately endorsing the flexible McDonald's ω over Cronbach's α. We then demonstrate how to estimate key measures in both a commercial software program (Mplus) and three packages within an open‐source environment (R). In closing, we make recommendations for practitioners about best practices in reliability estimation based on the ideas presented in the module.  相似文献   

4.
Assessments of student learning outcomes (SLO) have been widely used in higher education for accreditation, accountability, and strategic planning purposes. Although important to institutions, the assessment results typically bear no consequence for individual students. It is important to clarify the relationship between motivation and test performance and identify practical strategies to boost students' motivation in test taking. This study designed an experiment to examine the effectiveness of a motivational instruction. The instruction increased examinees' self-reported test-taking motivation by .89 standard deviations (SDs) and test scores by .63 SDs. Students receiving the instruction spent an average of 14 more seconds on an item than students in the control group. Score difference between experimental and control groups narrowed to .23 SDs after unmotivated students identified by low response time were removed from the analyses. The findings provide important implications for higher education institutions which administer SLO assessments in a low-stakes setting.  相似文献   

5.
Much interest has been expressed in the construct metacognition, the individual's knowledge and control of his own cognitive processes. Recent educational proposals have suggested the training of general metacognitive principles in schools. The exact nature of the construct has, however, remained vague. The aim of the present study was to provide some clarity. In a study of the metacognitive responses of 144 primary school children (aged 7‐11 years) four measures commonly used to assess metacognitive function were examined. First, the content of each measure was examined. Secondly, in an attempt to identify a metacognitive factor, commonality among the measures, both of developmental patterns and statistical relationship, was sought. Whilst a common pattern of development in the children's responses to the four measures was identified, factor analysis failed to provide evidence for a common metacognitive factor and unified construct.  相似文献   

6.
We developed a criterion-referenced student rating of instruction (SRI) to facilitate formative assessment of teaching. It involves four dimensions of teaching quality that are grounded in current instructional design principles: Organization and structure, Assessment and feedback, Personal interactions, and Academic rigor. Using item response theory and Wright mapping methods, we describe teaching characteristics at various points along the latent continuum for each scale. These maps enable criterion-referenced score interpretation by making an explicit connection between test performance and the theoretical framework. We explain the way our Wright maps can be used to enhance an instructor’s ability to interpret scores and identify ways to refine teaching. Although our work is aimed at improving score interpretation, a criterion-referenced test is not immune to factors that may bias test scores. The literature on SRIs is filled with research on factors unrelated to teaching that may bias scores. Therefore, we also used multilevel models to evaluate the extent to which student and course characteristic may affect scores and compromise score interpretation. Results indicated that student anger and the interaction between student gender and instructor gender are significant effects that account for a small amount of variance in SRI scores. All things considered, our criterion-referenced approach to SRIs is a viable way to describe teaching quality and help instructors refine pedagogy and facilitate course development.  相似文献   

7.
Noting the desirability of the current shift toward mastery testing and criterion-referenced test procedures, an evaluation model is presented which should be useful and practical for such purposes. This model is based on the assumptions that the learning of fundamental skills can be considered all or none, that each item response on a single skill test represents an unbiased sample of the examinee's true mastery status, that measurement error occurring on the test (as estimated from the average interitem correlation) can be of only one type (α or β) for each examinee, and that through practical and theoretical considerations of evaluation error costs and item error characteristics, an optimal mastery criterion can be calculated. Each of these assumptions is discussed and the resultant mastery criteria algorithm is described along with an example from the IPI math program.  相似文献   

8.
Orlando and Thissen's S‐X 2 item fit index has performed better than traditional item fit statistics such as Yen's Q1 and McKinley and Mill's G2 for dichotomous item response theory (IRT) models. This study extends the utility of S‐X 2 to polytomous IRT models, including the generalized partial credit model, partial credit model, and rating scale model. The performance of the generalized S‐X 2 in assessing item model fit was studied in terms of empirical Type I error rates and power and compared to G2. The results suggest that the generalized S‐X 2 is promising for polytomous items in educational and psychological testing programs.  相似文献   

9.
Literature relating to the well‐being of older adults was reviewed to identify indicators relevant to the construct of self‐responsibility for wellness. The wellness model proposed by Travis (1981) has produced a variety of concepts which can be useful in improving the quality of life for older adults. The purpose of this study was to develop an instrument which would assess an individual's self‐responsibility for wellness. A 47‐item instrument developed for this purpose was evaluated by experts in gerontology and psychology. After revision and reevaluation it was field‐tested on a sample of 180 older adults (60 years of age and over). In order to take preliminary steps in establishing the validity and reliability of this instrument, the data were evaluated and an item analysis conducted to identify poor items. Cronbach's coefficient alpha was also computed (α = .90). A test‐retest correlation coefficient was computed, and an analysis of variance was performed to test for the relationship between self‐responsibility for wellness and demographic variables obtained during the field test.

The field testing of the instrument served as an educational needs assessment study. Evidence has been provided that there is a significant need for education programs which can provide training in the wellness skills as assessed by the instrument.  相似文献   

10.
The aim of this study was to apply Rasch modeling to an examination of the psychometric properties of the Pearson Test of English Academic (PTE Academic). Analyzed were 140 test-takers' scores derived from the PTE Academic database. The mean age of the participants was 26.45 (SD = 5.82), ranging from 17 to 46. Conformity of the participants' performance on the 86 items of PTE Academic Form 1 of the field test was evaluated using the partial credit model. The person reliability coefficient was .96, and item reliability was .99. The results showed that no significant differential item functioning was found across subgroups of gender and spoken-language context, indicating that the item data approximated the Rasch model. The findings of this study validated the test stability of PTE Academic as a useful measurement tool for English language learners' academic English assessment.  相似文献   

11.
This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback‐Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson‐Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.  相似文献   

12.
Background Before the 1990s, an individual or medical model dominated educational research methodology with respect to younger children: the subjects of the research were usually considered untrustworthy sources of information. A subsequent shift towards an ecological model has focused on the child's perspective: however, Lewis and Lindsay have described the development of methods for conducting research with children as slow.

Purpose This paper examines how storytelling can be used as a method of collecting authentic and revealing research data from children. The method is suggested as a valuable way in which to gain insights into children's discourse, and is used in this paper in relation to children's discourse about reading.

Sample, design and methods The storytelling method was initially trialled in one school with 36 children aged between 5 and 11 years. The storytelling interview was then used in case studies over a period of a year in three schools, with a total of 88 7- and 8-year-old children. During the interviews, children were asked to tell a story entitled ‘The child who didn't like reading’. Systematic content analysis was undertaken to identify emergent cultural norms and models in the stories. Information on the children's reading practices, and their observations on reading, was also collected for the purposes of triangulation.

Results The children's storytelling gave access to their cultural models of reading. It was found that the stories demonstrated sufficient triangulation with the other data about the children's reading practices to support a sociocultural production of the children's discourse.

Conclusions Storytelling can provide a useful and credible method of collecting research data from children. It may be especially useful with poor readers as there are no literacy demands, and in this respect, affords socially inclusive research.  相似文献   

13.
For over 50 years, seven plus or minus two has been a commonly used guideline for gauging how many chunks of new information should be presented at one time in learning and performance situations. Often cited as the limit of working memory, this guideline was created as a result of misinterpreting an article by Miller (1956). More recent studies suggest that the limit for working memory is more like three, and sometimes four, with various factors influencing the capacity of an individual's working memory. Given too much novel information at one time, learners and performers can be derailed by cognitive overload. Instructional designers and performance consultants can adjust the presentation of new information to manage intrinsic, extraneous, and germane cognitive load. This column provides suggestions about how to reduce cognitive overload to improve learning and performance.  相似文献   

14.
15.
A look at real data shows that Reckase's psychometric theory for standard setting is not applicable to bookmark and that his simulations cannot explain actual differences between methods. It is suggested that exclusively test-centered, criterion-referenced approaches are too idealized and that a psychophysics paradigm and a theory of group behavior could be more useful in thinking about the standard setting process. In this view, item mapping methods such as bookmark are reasonable adaptations to fundamental limitations in human judgments of item difficulty. They make item ratings unnecessary and have unique potential for integrating external validity data and student performance data more fully into the standard setting process.  相似文献   

16.
The achievement motive concept refers to a relatively stable personality characteristic in terms of a capacity to anticipate affects in achievement situations. The motive to achieve success (M s ) refers to the individual's capacity to anticipate positive affects, and the motive to avoid failure (M f ) refers to a capacity to anticipate negative affects in achievement situations. Based, among other things, on the conceptualizations of motives, a measurement was constructed to tap the two aspects (M s and M f ) of motivation. Over the years the scale has been translated to several languages and used in a number of studies. The Czech version of the scale is an adapted translation of the English one, and was administered to 179 pupils in the sixth grade in 1989. Further, the subjects were retested after an interval of 12 weeks. The analyses indicate that the psychometric properties of the Czech version of AMS are promising.  相似文献   

17.
In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity estimation consistent with this philosophy are suggested. Also, to improve the decision-making accuracy of criterion-referenced tests, a Bayesian procedure for estimating true mastery scores has been proposed. This Bayesian procedure uses information about other members of a student's group (collateral information), but the resulting estimation is still criterion referenced rather than norm referenced in that the student is compared to a standard rather than to other students. In theory, the Bayesian procedure increases the “effective length” of the test by improving the reliability, the validity, and more importantly, the decision-making accuracy of the criterion-referenced test scores.  相似文献   

18.
This study examines changes in black adolescents’ perceptions of the elderly following participation in an eight‐week intergenerational project. The project matched 19 teenagers with 19 elderly subjects from a large senior citizens center. Using an experimental design, students were matched by age with a control group (n = 20). A 20‐item semantic differential scale and the Children's Perceptions of Aging and Elderly (CPAE) inventory were used to measure attitude change. Posttest results from a matched pair t‐test found significant attitude change in the experimental group: semantic differential (t = 2.8, p < .01); CPAE (t = 4.2, p < .01). Qualitative comments from the youth and elderly participants further indicate positive qualities of the partners program.  相似文献   

19.
Arguments favoring free- over forced-distribution Q sorts have assumed that forcing leads to loss of important statistical information and interferes with interval properties, rendering Pearson's r inappropriate for analysis. Q sorts with identical item orderings but with varied distributions are shown to provide essentially the same correlations and factor structures when coefficients are computed using Spearman's rs, Kendall's τ, and Pearson's r, leading to the conclusion that the same results are obtained, despite distribution and whether interval or ordinal statistics are used.  相似文献   

20.
Nambury S. Raju (1937–2005) developed two model‐based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures ( Raju, 1988 ) and Raju's DFIT ( Raju, van der Linden, & Fleer, 1995 ), are based on quantifying the gap between item characteristic functions (ICFs). This approach provides an intuitive and flexible methodology for assessing DIF. The purpose of this tutorial is to explain DFIT and show how this methodology can be utilized in a variety of DIF applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号