首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 812 毫秒
1.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

2.
In this ITEMS module, we provide a didactic overview of the specification, estimation, evaluation, and interpretation steps for diagnostic measurement/classification models (DCMs), which are a promising psychometric modeling approach. These models can provide detailed skill‐ or attribute‐specific feedback to respondents along multiple latent dimensions and hold theoretical and practical appeal for a variety of fields. We use a current unified modeling framework—the log‐linear cognitive diagnosis model (LCDM)—as well as a series of quality‐control checklists for data analysts and scientific users to review the foundational concepts, practical steps, and interpretational principles for these models. We demonstrate how the models and checklists can be applied in real‐life data‐analysis contexts. A library of macros and supporting files for Excel, SAS, and Mplus are provided along with video tutorials for key practices.  相似文献   

3.
In this digital ITEMS module, Dr. Jue Wang and Dr. George Engelhard Jr. describe the Rasch measurement framework for the construction and evaluation of new measures and scales. From a theoretical perspective, they discuss the historical and philosophical perspectives on measurement with a focus on Rasch's concept of specific objectivity and invariant measurement. Specifically, they introduce the origins of Rasch measurement theory, the development of model‐data fit indices, as well as commonly used Rasch measurement models. From an applied perspective, they discuss best practices in constructing, estimating, evaluating, and interpreting a Rasch scale using empirical examples. They provide an overview of a specialized Rasch software program (Winsteps) and an R program embedded within Shiny (Shiny_ERMA) for conducting the Rasch model analyses. The module is designed to be relevant for students, researchers, and data scientists in various disciplines such as psychology, sociology, education, business, health, and other social sciences. It contains audio‐narrated slides, sample data, syntax files, access to Shiny_ERMA program, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

4.
In this digital ITEMS module, Dr. Jeffrey Harring and Ms. Tessa Johnson introduce the linear mixed effects (LME) model as a flexible general framework for simultaneously modeling continuous repeated measures data with a scientifically defensible function that adequately summarizes both individual change as well as the average response. The module begins with a nontechnical overview of longitudinal data analyses drawing distinctions with cross-sectional analyses in terms of research questions to be addressed. Nuances of longitudinal designs, timing of measurements, and the real possibility of missing data are then discussed. The three interconnected components of the LME model—(1) a model for individual and mean response profiles, (2) a model to characterize the covariation among the time-specific residuals, and (3) a set of models that summarize the extent that individual coefficients vary—are discussed in the context of the set of activities comprising an analysis. Finally, they demonstrate how to estimate the linear mixed effects model within an open-source environment (R). The digital module contains sample R code, diagnostic quiz questions, hands-on activities in R, curated resources, and a glossary.  相似文献   

5.
6.
In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows scores to be returned faster at lower cost. In the module, they discuss automated scoring from a number of perspectives. First, they discuss benefits and weaknesses of automated scoring, and what psychometricians should know about automated scoring. Next, they describe the overall process of automated scoring, moving from data collection to engine training to operational scoring. Then, they describe how automated scoring systems work, including the basic functions around score prediction as well as other flagging methods. Finally, they conclude with a discussion of the specific validity demands around automated scoring and how they align with the larger validity demands around test scores. Two data activities are provided. The first is an interactive activity that allows the user to train and evaluate a simple automated scoring engine. The second is a worked example that examines the impact of rater error on test scores. The digital module contains a link to an interactive web application as well as its R-Shiny code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

7.
In this digital ITEMS module, Dr. Brian Leventhal and Dr. Allison Ames provide an overview of Monte Carlo simulation studies (MCSS) in item response theory (IRT). MCSS are utilized for a variety of reasons, one of the most compelling being that they can be used when analytic solutions are impractical or nonexistent because they allow researchers to specify and manipulate an array of parameter values and experimental conditions (e.g., sample size, test length, and test characteristics). Dr. Leventhal and Dr. Ames review the conceptual foundation of MCSS in IRT and walk through the processes of simulating total scores as well as item responses using the two-parameter logistic, graded response, and bifactor models. They provide guidance for how to implement MCSS using other item response models and best practices for efficient syntax and executing an MCSS. The digital module contains sample SAS code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

8.
Game-based assessment (GBA), a specific application of games for learning, has been recognized as an alternative form of assessment. While there is a substantive body of literature that supports the educational benefits of GBA, limited work investigates the validity and generalizability of such systems. In this paper, we describe applications of learning analytics methods to provide evidence for psychometric qualities of a digital GBA called Shadowspect, particularly to what extent Shadowspect is a robust assessment tool for middle school students' spatial reasoning skills. Our findings indicate that Shadowspect is a valid assessment for spatial reasoning skills, and it has comparable precision for both male and female students. In addition, students' enjoyment of the game is positively related to their overall competency as measured by the game regardless of the level of their existing spatial reasoning skills.

Practitioner notes

What is already known about this topic:
  • Digital games can be a powerful context to support and assess student learning.
  • Games as assessments need to meet certain psychometric qualities such as validity and generalizability.
  • Learning analytics provide useful ways to establish assessment models for educational games, as well as to investigate their psychometric qualities.
What this paper adds:
  • How a digital game can be coupled with learning analytics practices to assess spatial reasoning skills.
  • How to evaluate psychometric qualities of game-based assessment using learning analytics techniques.
  • Investigation of validity and generalizability of game-based assessment for spatial reasoning skills and the interplay of the game-based assessment with enjoyment.
Implications for practice and/or policy:
  • Game-based assessments that incorporate learning analytics can be used as an alternative to pencil-and-paper tests to measure cognitive skills such as spatial reasoning.
  • More training and assessment of spatial reasoning embedded in games can motivate students who might not be on the STEM tracks, thus broadening participation in STEM.
  • Game-based learning and assessment researchers should consider possible factors that affect how certain populations of students enjoy educational games, so it does not further marginalize specific student populations.
  相似文献   

9.
Beginning Bayes     
Understanding a Bayesian perspective demands comfort with conditional probability and with probabilities that appear to change as we acquire additional information. This paper suggests a simple context in conditional probability that helps develop the understanding students would need for a successful introduction to Bayesian reasoning.  相似文献   

10.
Multilevel modeling is a statistical approach to analyze hierarchical data that consist of individual observations nested within clusters. Bayesian method is a well-known, sometimes better, alternative of Maximum likelihood method for fitting multilevel models. Lack of user friendly and computationally efficient software packages or programs was a main obstacle in applying Bayesian multilevel modeling. In recent years, the development of software packages for multilevel modeling with improved Bayesian algorithms and faster speed has been growing. This article aims to update the knowledge of software packages for Bayesian multilevel modeling and therefore to promote the use of these packages. Three categories of software packages capable of Bayesian multilevel modeling including brms, MCMCglmm, glmmBUGS, Bambi, R2BayesX, BayesReg, R2MLwiN and others are introduced and compared in terms of computational efficiency, modeling capability and flexibility, as well as user-friendliness. Recommendations to practical users and suggestions for future development are also discussed.  相似文献   

11.
12.
In this ITEMS module, we introduce the generalized deterministic inputs, noisy “and” gate (G‐DINA) model, which is a general framework for specifying, estimating, and evaluating a wide variety of cognitive diagnosis models. The module contains a nontechnical introduction to diagnostic measurement, an introductory overview of the G‐DINA model, as well as common special cases, and a review of model‐data fit evaluation practices within this framework. We use the flexible GDINA R package, which is available for free within the R environment and provides a user‐friendly graphical interface in addition to the code‐driven layer. The digital module also contains videos of worked examples, solutions to data activity questions, curated resources, a glossary, and quizzes with diagnostic feedback.  相似文献   

13.
As the popularity of rich assessment scenarios increases so must the availability of psychometric models capable of handling the resulting data. Dynamic Bayesian networks (DBNs) offer a fast, flexible option for characterizing student ability across time under psychometrically complex conditions. In this article, a brief introduction to DBNs is offered, followed by a review of the existing literature on the use of DBNs in educational and psychological measurement with a focus on methodological investigations and novel applications that may provide guidance for practitioners wishing to deploy these models. The article concludes with a discussion of future directions for research in the field.  相似文献   

14.
In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes. Learners are introduced to historical, theoretical, and procedural differences between these methods and how to use and analyze distinct types of verbal reports in the collection of evidence of test-taker response processes. The module includes details on (a) the different types of cognition that are tapped by different interviewer probes, (b) traditional interviewing methods and new automated tools for collecting verbal reports, and (c) options for analyses of verbal reports. This includes a discussion of reliability and validity issues such as potential bias in the collection of verbal reports, ways to mitigate bias, and inter-rater agreement to enhance credibility of analysis. A novel digital tool for data collection called the ABC tool is presented via illustrative videos. As always, the module contains audio-narrated slides, quiz questions with feedback, a glossary, and curated resources.  相似文献   

15.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.  相似文献   

16.
The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection. As a simplified demonstration, we illustrate (1) how Bayesians test and compare two non‐nested growth curve models using Bayesian estimation with non‐informative prior; (2) how Bayesians model and handle missing outcomes in the context of missing values; and (3) how Bayesians incorporate data‐based evidence from a previous data set, construct informative priors and treat them as extra information while conducting an up‐to‐date analogy analysis.  相似文献   

17.
A cognitive item response theory model called the attribute hierarchy method (AHM) is introduced and illustrated. This method represents a variation of Tatsuoka's rule-space approach. The AHM is designed explicitly to link cognitive theory and psychometric practice to facilitate the development and analyses of educational and psychological tests. The following are described: cognitive properties of the AHM; psychometric properties of the AHM, as well as a demonstration of how the AHM differs from Tatsuoka's rule-space approach; and application of the AHM to the domain of syllogistic reasoning to illustrate how this approach can be used to evaluate the cognitive competencies required in a higher-level thinking task. Future directions for research are also outlined.  相似文献   

18.
Bayesian methods are becoming very popular despite some practical difficulties in implementation. To assist in the practical application of Bayesian methods, we show how to implement Bayesian analysis with WinBUGS as part of a standard set of SAS routines. This implementation procedure is first illustrated by fitting a multiple regression model and then a linear growth curve model. A third example is also provided to demonstrate how to iteratively run WinBUGS inside SAS for Monte Carlo simulation studies. The SAS codes used in this study are easily extended to accommodate many other models with only slight modification. This interface can be of practical benefit in many aspects of Bayesian methods because it allows the SAS users to benefit from the implementation of Bayesian estimation and it also allows the WinBUGS user to benefit from the data processing routines available in SAS.  相似文献   

19.
In this digital ITEMS module, Nikole Gregg and Dr. Brian Leventhal discuss strategies to ensure data visualizations achieve graphical excellence. Data visualizations are commonly used by measurement professionals to communicate results to examinees, the public, educators, and other stakeholders. To do so effectively, it is important that these visualizations communicate data efficiently and accurately. These visualizations can achieve graphical excellence when they simultaneously display data effectively, efficiently, and accurately. Unfortunately, measurement and statistical software default graphics typically fail to uphold these standards and are therefore not suitable for publication or presentation to the public. To illustrate best practices, the instructors provide an introduction to the graphical template language in SAS and show how elementary components can be used to make efficient, effective, and accurate graphics for a variety of audiences. The module contains audio-narrated slides, embedded illustrative videos, quiz questions with diagnostic feedback, a glossary, sample SAS code, and other learning resources.  相似文献   

20.
This article presents relevant research on Bayesian methods and their major applications to modeling in an effort to lay out differences between the frequentist and Bayesian paradigms and to look at the practical implications of these differences. Before research is reviewed, basic tenets and methods of the Bayesian approach to modeling are presented and contrasted with basic estimation results from a frequentist perspective. It is argued that Bayesian methods have become a viable alternative to traditional maximum likelihood-based estimation techniques and may be the only solution for more complex psychometric data structures. Hence, neither the applied nor the theoretical measurement community can afford to neglect the exciting new possibilities that have opened up on the psychometric horizon.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号