首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Item analysis is an integral part of operational test development and is typically conducted within two popular statistical frameworks: classical test theory (CTT) and item response theory (IRT). In this digital ITEMS module, Hanwook Yoo and Ronald K. Hambleton provide an accessible overview of operational item analysis approaches within these frameworks. They review the different stages of test development and associated item analyses to identify poorly performing items and effective item selection. Moreover, they walk through the computational and interpretational steps for CTT‐ and IRT‐based evaluation statistics using simulated data examples and review various graphical displays such as distractor response curves, item characteristic curves, and item information curves. The digital module contains sample data, Excel sheets with various templates and examples, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

2.
3.
In this ITEMS module, we introduce the generalized deterministic inputs, noisy “and” gate (G‐DINA) model, which is a general framework for specifying, estimating, and evaluating a wide variety of cognitive diagnosis models. The module contains a nontechnical introduction to diagnostic measurement, an introductory overview of the G‐DINA model, as well as common special cases, and a review of model‐data fit evaluation practices within this framework. We use the flexible GDINA R package, which is available for free within the R environment and provides a user‐friendly graphical interface in addition to the code‐driven layer. The digital module also contains videos of worked examples, solutions to data activity questions, curated resources, a glossary, and quizzes with diagnostic feedback.  相似文献   

4.
In this digital ITEMS module, Dr. Jue Wang and Dr. George Engelhard Jr. describe the Rasch measurement framework for the construction and evaluation of new measures and scales. From a theoretical perspective, they discuss the historical and philosophical perspectives on measurement with a focus on Rasch's concept of specific objectivity and invariant measurement. Specifically, they introduce the origins of Rasch measurement theory, the development of model‐data fit indices, as well as commonly used Rasch measurement models. From an applied perspective, they discuss best practices in constructing, estimating, evaluating, and interpreting a Rasch scale using empirical examples. They provide an overview of a specialized Rasch software program (Winsteps) and an R program embedded within Shiny (Shiny_ERMA) for conducting the Rasch model analyses. The module is designed to be relevant for students, researchers, and data scientists in various disciplines such as psychology, sociology, education, business, health, and other social sciences. It contains audio‐narrated slides, sample data, syntax files, access to Shiny_ERMA program, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

5.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

6.
In this ITEMS module, we frame the topic of scale reliability within a confirmatory factor analysis and structural equation modeling (SEM) context and address some of the limitations of Cronbach's α. This modeling approach has two major advantages: (1) it allows researchers to make explicit the relation between their items and the latent variables representing the constructs those items intend to measure, and (2) it facilitates a more principled and formal practice of scale reliability evaluation. Specifically, we begin the module by discussing key conceptual and statistical foundations of the classical test theory model and then framing it within an SEM context; we do so first with a single item and then expand this approach to a multi‐item scale. This allows us to set the stage for presenting different measurement structures that might underlie a scale and, more importantly, for assessing and comparing those structures formally within the SEM context. We then make explicit the connection between measurement model parameters and different measures of reliability, emphasizing the challenges and benefits of key measures while ultimately endorsing the flexible McDonald's ω over Cronbach's α. We then demonstrate how to estimate key measures in both a commercial software program (Mplus) and three packages within an open‐source environment (R). In closing, we make recommendations for practitioners about best practices in reliability estimation based on the ideas presented in the module.  相似文献   

7.
In this ITEMS module, we provide a didactic overview of the specification, estimation, evaluation, and interpretation steps for diagnostic measurement/classification models (DCMs), which are a promising psychometric modeling approach. These models can provide detailed skill‐ or attribute‐specific feedback to respondents along multiple latent dimensions and hold theoretical and practical appeal for a variety of fields. We use a current unified modeling framework—the log‐linear cognitive diagnosis model (LCDM)—as well as a series of quality‐control checklists for data analysts and scientific users to review the foundational concepts, practical steps, and interpretational principles for these models. We demonstrate how the models and checklists can be applied in real‐life data‐analysis contexts. A library of macros and supporting files for Excel, SAS, and Mplus are provided along with video tutorials for key practices.  相似文献   

8.
In the current No Child Left Behind era, K‐12 teachers and principals are expected to have a sophisticated understanding of standardized test results, use them to improve instruction, and communicate them to others. The goal of our project, funded by the National Science Foundation, was to develop and evaluate three Web‐based instructional modules in educational measurement and statistics to help school personnel acquire the “assessment literacy” required for these roles. Our first module, “What's the Score?” was administered in 2005 to 113 educators who also completed an assessment literacy quiz. Viewing the module had a small but statistically significant positive effect on quiz scores. Our second module, “What Test Scores Do and Don't Tell Us,” administered in 2006 to 104 educators, was even more effective, primarily among teacher education students. In evaluating our third module, “What's the Difference?” we were able to recruit only 33 participants. Although those who saw the module before taking the quiz outperformed those who did not, results were not statistically significant. Now that the research phase is complete, all ITEMS instructional materials are freely available on our Website.  相似文献   

9.
In this digital ITEMS module, Dr. Brian Leventhal and Dr. Allison Ames provide an overview of Monte Carlo simulation studies (MCSS) in item response theory (IRT). MCSS are utilized for a variety of reasons, one of the most compelling being that they can be used when analytic solutions are impractical or nonexistent because they allow researchers to specify and manipulate an array of parameter values and experimental conditions (e.g., sample size, test length, and test characteristics). Dr. Leventhal and Dr. Ames review the conceptual foundation of MCSS in IRT and walk through the processes of simulating total scores as well as item responses using the two-parameter logistic, graded response, and bifactor models. They provide guidance for how to implement MCSS using other item response models and best practices for efficient syntax and executing an MCSS. The digital module contains sample SAS code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

10.
Quality control (QC) in testing is paramount. QC procedures for tests can be divided into two types. The first type, one that has been well researched, is QC for tests administered to large population groups on few administration dates using a small set of test forms (e.g., large‐scale assessment). The second type is QC for tests, usually computerized, that are administered to small population groups on many administration dates using a wide array of test forms (CMT—continuous mode tests). Since the world of testing is headed in this direction, developing QC for CMT is crucial. In the current ITEMS module we discuss errors that might occur at the different stages of the CMT process, as well as the recommended QC procedure to reduce the incidence of each error. Illustration from a recent study is provided, and a computerized system that applies these procedures is presented. Instructions on how to develop one's own QC procedure are also included.  相似文献   

11.
In this paper we discuss the background to this study in the development of the international MSc e‐Learning Multimedia and Consultancy. The aims of the study focus on the conditions for achieving communication, interaction and collaboration in open and flexible e‐learning environments. We present our theoretical framework that has informed the design of programme as a whole which is based on a socio‐constructivist perspective on learning. Our research is placed within an action research framework and we outline our position within the critical or emancipatory tradition and also our standpoint on the use of ICT in education. We discuss the design of the programme and also our pedagogical approach and describe in detail the particular context for this study. We report on the student experience of being learners on this module, their perceptions of what they have gained most from learning from and with each other and their responses to the various ways in which ‘scaffolding’ has been designed and implemented by the tutors. Finally we offer some reflections on the conditions for achieving well‐orchestrated interdependence in open and flexible e‐learning environments.  相似文献   

12.
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models.  相似文献   

13.
The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient ρ XX as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two classical test theory (CTT) reliability statistics (Cronbach’s alpha and Feldt–Gilmer congeneric coefficients) under various testing conditions that result from manipulating large-scale real data. For the first purpose, two alternative ways of exactly quantifying ρ XX are compared in terms of computational efficiency and statistical usefulness. In addition, the lower and upper bounds for ρ XX are presented in line with the assumptions of essential tau-equivalence and congeneric similarity, respectively. Empirical studies conducted for the second purpose showed across all testing conditions that (1) the IRT reliability coefficient was higher than the CTT reliability statistics; (2) the IRT reliability coefficient was closer to the Feldt–Gilmer coefficient than to the Cronbach’s alpha coefficient; and (3) the alpha coefficient was close to the lower bound of IRT reliability. Some advantages of the IRT approach to estimating test-score reliability over the CTT approaches are discussed in the end.  相似文献   

14.
15.
Views on testing—its purpose and uses and how its data are analyzed—are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a contestant in some detail. Test takers who are contestants in high‐stakes settings want reliable outcomes obtained via acceptable scoring of tests administered under clear rules. In addition, it is essential to empirically verify interpretations attached to scores. At the very least, item and test scores should exhibit certain invariance properties. I note that the “do no harm” dictum borrowed from the field of medicine is particularly relevant to the perspective of test takers as contestants.  相似文献   

16.
In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows scores to be returned faster at lower cost. In the module, they discuss automated scoring from a number of perspectives. First, they discuss benefits and weaknesses of automated scoring, and what psychometricians should know about automated scoring. Next, they describe the overall process of automated scoring, moving from data collection to engine training to operational scoring. Then, they describe how automated scoring systems work, including the basic functions around score prediction as well as other flagging methods. Finally, they conclude with a discussion of the specific validity demands around automated scoring and how they align with the larger validity demands around test scores. Two data activities are provided. The first is an interactive activity that allows the user to train and evaluate a simple automated scoring engine. The second is a worked example that examines the impact of rater error on test scores. The digital module contains a link to an interactive web application as well as its R-Shiny code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

17.
The R2D2 method—read, reflect, display, and do—is a new model for designing and delivering distance education, and in particular, online learning. Such a model is especially important to address the diverse preferences of online learners of varied generations and varied Internet familiarity. Four quadrants can be utilized separately or as part of a problem‐solving process: the first component primarily relates to methods to help learners acquire knowledge through online readings, virtual explorations, and listening to online lectures and podcasts. As such, it addresses verbal and auditory learners. The second component of the model focuses on reflective activities such as online blogs, reflective writing tasks, self‐check examinations, and electronic portfolios. In the third quadrant, visual representations of the content are highlighted with techniques such as virtual tours, timelines, animations, and concept maps. Fourth, the model emphasizes what learners can do with the content in hands‐on activities including simulations, scenarios, and real‐time cases. In effect, the R2D2 model is one means to organize and make sense of the diverse array of instructional possibilities currently available in distance education. It provides new ways of learning for diverse online students, and demonstrates easy‐to‐apply learning activities for instructors to integrate various technologies in online learning. When thoughtfully designed, content delivered from this perspective should be more enriching for learners. The R2D2 model provides a framework for more engaging, dynamic, and responsive teaching and learning in online environments.  相似文献   

18.
In this digital ITEMS module, Nikole Gregg and Dr. Brian Leventhal discuss strategies to ensure data visualizations achieve graphical excellence. Data visualizations are commonly used by measurement professionals to communicate results to examinees, the public, educators, and other stakeholders. To do so effectively, it is important that these visualizations communicate data efficiently and accurately. These visualizations can achieve graphical excellence when they simultaneously display data effectively, efficiently, and accurately. Unfortunately, measurement and statistical software default graphics typically fail to uphold these standards and are therefore not suitable for publication or presentation to the public. To illustrate best practices, the instructors provide an introduction to the graphical template language in SAS and show how elementary components can be used to make efficient, effective, and accurate graphics for a variety of audiences. The module contains audio-narrated slides, embedded illustrative videos, quiz questions with diagnostic feedback, a glossary, sample SAS code, and other learning resources.  相似文献   

19.
In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes. Learners are introduced to historical, theoretical, and procedural differences between these methods and how to use and analyze distinct types of verbal reports in the collection of evidence of test-taker response processes. The module includes details on (a) the different types of cognition that are tapped by different interviewer probes, (b) traditional interviewing methods and new automated tools for collecting verbal reports, and (c) options for analyses of verbal reports. This includes a discussion of reliability and validity issues such as potential bias in the collection of verbal reports, ways to mitigate bias, and inter-rater agreement to enhance credibility of analysis. A novel digital tool for data collection called the ABC tool is presented via illustrative videos. As always, the module contains audio-narrated slides, quiz questions with feedback, a glossary, and curated resources.  相似文献   

20.
Modern software practices call for the active involvement of business people in the software process. Therefore, programming has become an indispensable part of the information systems component of the core curriculum at business schools. In this paper, we present a model‐based approach to teaching introduction to programming to general business students. The theoretical underpinnings of the new approach are metaphor, abstraction, modeling, Bloom's classification of cognitive skills, and active learning. We employ models to introduce the basic programming constructs and their semantics. To this end, we use statecharts to model object's state and the environment model of evaluation as a virtual machine interpreting the programs written in JavaScript. The adoption of this approach helps learners build a sound mental model of the notion of computation process. Scholastic performance, student evaluations, our experiential observations, and a multiple regression statistical test prove that the proposed ideas improve the course significantly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号