首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Abstract

The present study compared the performance of six cognitive diagnostic models (CDMs) to explore inter skill relationship in a reading comprehension test. To this end, item responses of about 21,642 test-takers to a high-stakes reading comprehension test were analyzed. The models were compared in terms of model fit at both test and item levels, classification consistency and accuracy, and proportion of skill mastery profiles. The results showed that the G-DINA performed the best and the C-RUM, NC-RUM, and ACDM showed the closest affinity to the G-DINA. In terms of some criteria, the DINA showed comparable performance to the G-DINA. The test-level results were corroborated by the item-level model comparison, where DINA, DINO, and ACDM variously fit some of the items. The results of the study suggested that relationships among the subskills of reading comprehension might be a combination of compensatory and non-compensatory. Therefore, it is suggested that the choice of the CDM be carried out at item level rather than test level.  相似文献   

2.
The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT critical reading items and illustrate how the method can be used to promote cognitive diagnostic inferences. The AHM is a psychometric procedure for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components from a cognitive model. The study was conducted in two steps. In step 1, three cognitive models were developed by reviewing selected literature in reading comprehension as well as research related to SAT Critical Reading. Then, the cognitive models were validated by having a sample of students think aloud as they solved each item. In step 2, psychometric analyses were conducted on the SAT critical reading cognitive models by evaluating the model‐data fit between the expected and observed response patterns produced from two random samples of 2,000 examinees who wrote the items. The model that provided best data‐model fit was then used to calculate attribute probabilities for 15 examinees to illustrate our diagnostic testing procedure.  相似文献   

3.
The purpose of this study was to evaluate the adequacy of three cognitive models, one developed by content experts and two generated from student verbal reports for explaining examinee performance on a grade 3 diagnostic mathematics test. For this study, the items were developed to directly measure the attributes in the cognitive model. The performance of each cognitive model was evaluated by examining its fit to different data samples: verbal report, total, high-, moderate-, and low ability using the Hierarchy Consistency Index (Cui & Leighton, 2009), a model-data fit index. This study utilized cognitive diagnostic assessments developed under the framework of construct-centered test design and analyzed using the Attribute Hierarchy Method (Gierl, Wang, & Zhou, 2008; Leighton, Gierl, & Hunka, 2004). Both the expert-based and the student-based cognitive models provided excellent fit to the verbal report and high ability samples, but moderate to poor fit to the total, moderate and low ability samples. Implications for cognitive model development for cognitive diagnostic assessment are discussed.  相似文献   

4.
Cognitive diagnostic assessment (CDA) approach has been increasingly applied to non-diagnostic large-scale assessments to extract fine-grained diagnostic feedback about students’ ability in a given domain and meet accountability demands for student achievement. This study aimed to diagnose the reading abilities of 4324 students from 19 European Union (EU) member countries that participated in the 2016 Progress in International Reading Literacy Study (PIRLS), one of the most comprehensive international studies that investigate students’ reading achievement. The PIRLS data were analyzed by using the Log-linear Cognitive Diagnosis Modeling (LCDM), a type of cognitive diagnostic classification model (DCM). Students’ weaknesses and strengths were identified based on a four-skill reading ability model. The results revealed that the methodology could provide more fine-grained diagnostic information about students’ reading skills than traditional aggregated-test scoring could. Such information could be utilized by teachers, school administrators, decision-makers, and students for maximizing the learning outcomes of reading programs and instruction.  相似文献   

5.
Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM‐based data with a linear attribute structure. The study utilizes a procedure to make the IRM and CDM frameworks comparable and investigates how estimation accuracy is affected by test diagnosticity and the match between the true and fitted models. The study shows that comparable results can be obtained when highly diagnostic IRM data are retrofitted with CDM, and vice versa, retrofitting CDMs to IRM‐based data in some conditions can result in considerable examinee misclassification, and model fit indices provide limited indication of the accuracy of item parameter estimation and attribute classification.  相似文献   

6.
In this ITEMS module, we provide a didactic overview of the specification, estimation, evaluation, and interpretation steps for diagnostic measurement/classification models (DCMs), which are a promising psychometric modeling approach. These models can provide detailed skill‐ or attribute‐specific feedback to respondents along multiple latent dimensions and hold theoretical and practical appeal for a variety of fields. We use a current unified modeling framework—the log‐linear cognitive diagnosis model (LCDM)—as well as a series of quality‐control checklists for data analysts and scientific users to review the foundational concepts, practical steps, and interpretational principles for these models. We demonstrate how the models and checklists can be applied in real‐life data‐analysis contexts. A library of macros and supporting files for Excel, SAS, and Mplus are provided along with video tutorials for key practices.  相似文献   

7.
8.
The purpose of this article is to develop a statistical model that best explains variability in the number of school days suspended. Number of school days suspended is a count variable that may be zero-inflated and overdispersed relative to a Poisson model. Four models were examined: Poisson, negative binomial, Poisson hurdle, and negative binomial hurdle. Additionally, the probability of a student being suspended for at least 1 day was modeled using a binomial logistic regression model. Of the count models considered, the negative binomial hurdle model had the best fit. Modeling the probability of a student being suspended for at least 1 day using a binomial logistic regression model with interactions fit both the training and test data and had adequate fit. Findings here suggest that both the negative binomial hurdle and the binomial logistic regression models should be considered when modeling school suspensions.  相似文献   

9.
As with any psychometric models, the validity of inferences from cognitive diagnosis models (CDMs) determines the extent to which these models can be useful. For inferences from CDMs to be valid, it is crucial that the fit of the model to the data is ascertained. Based on a simulation study, this study investigated the sensitivity of various fit statistics for absolute or relative fit under different CDM settings. The investigation covered various types of model–data misfit that can occur with the misspecifications of the Q‐matrix, the CDM, or both. Six fit statistics were considered: –2 log likelihood (–2LL), Akaike's information criterion (AIC), Bayesian information criterion (BIC), and residuals based on the proportion correct of individual items (p), the correlations (r), and the log‐odds ratio of item pairs (l). An empirical example involving real data was used to illustrate how the different fit statistics can be employed in conjunction with each other to identify different types of misspecifications. With these statistics and the saturated model serving as the basis, relative and absolute fit evaluation can be integrated to detect misspecification efficiently.  相似文献   

10.
This article is concerned with the difference in noncentrality parameters of nested structural equation models and their utility in evaluating statistical power associated with the pertinent restriction test. Based on the seminal work by Browne and Du Toit (1992), Steiger (1989, 1990), Steiger and Lind (1980), and Steiger, Shapiro, and Browne (1985), asymptotic confidence intervals for that difference are discussed. The intervals represent a useful adjunct to widely employed goodness‐of‐fit indexes and test statistics when assessing plausibility of constraints in nested models. The approach also permits estimating power of the test of validity of the nesting restrictions (cf. MacCallum, Browne, & Sugawara, 1996). It is illustrated on data from a 2‐group cognitive training study.  相似文献   

11.
In the past, several models have been developed for the estimation of the reliability and validity of measurement instruments from multitrait-multimethod (MTMM) experiments. Suggestions have been made for additive, multiplicative and correlated uniqueness models, whereas recently Coenders and Saris (2000) suggested a procedure to test these models against one another. In this article, the different models suggested for the analysis of MTMM matrixes have been compared for their fit to 87 data sets collected in the United States (Andrews, 1984; Rodgers, Andrews, & Herzog, 1992), Austria (Koltringer, 1995), and the Netherlands (Scherpenzeel & Saris, 1997). As most variables are categorical, the analysis has been carried out on the basis of polychoric-polyserial correlation coefficients and of Pearson correlations. The fit of the models based on polychoric correlations is much worse than the fit of models based on product moment correlations, but in both cases a model that assumes additive method effects fits most data sets better than the other models, including the so-called multiplicative models.  相似文献   

12.
International Journal for Educational and Vocational Guidance - The log-linear cognitive diagnosis model (LCDM) is a modern technique that dichotomously classifies individuals (e.g., possession and...  相似文献   

13.
This study tested four theoretical models in terms of their fit with demands placed on our cognitive system by traditional tests of cognitive ability. We did so by administering seven tests of cognitive ability known to require varying types of processing demands to a large group of college undergraduates (N = 193). We compared the models using confirmatory factor analyses, including those based upon a unitary factor, speed and capacity, crystallized and fluid intelligence, and verbal and spatial ability. The crystallized/fluid model provided the best fit with the data. This finding is consistent with previous research. Implications for education and future research are discussed.  相似文献   

14.
The cognitive model is a prominent element for supporting teachers' instructional needs regarding the confusing geometry topic named ‘Parallel and Perpendicular Lines’. Nonetheless, the mismatch between the derived cognitive models and the students’ cognition would threaten the validity of the diagnostic inferences made. While cognitive models could be developed by using different approaches, this study evaluated the model-data fit of the expert-based cognitive models which were developed by conducting expert task analysis and the theory-based cognitive models which were developed by reviewing the available theory. The study was conducted by adopting a cross-sectional research design. A total of 1 069 Grade Four students were selected by using two-stage cluster sampling. The findings indicated that the average model-data fit increased with the abilities of students for both theory-based and expert-based cognitive models on ‘Parallel and Perpendicular Lines’. Nonetheless, the theory-based cognitive models have a better model-data fit compared to the expert-based cognitive models. The findings imply the significance of the available theory of cognition in guiding the development of cognitive models. With satisfactory model-data fit, the theory-based cognitive models could serve as a guide for making diagnostic inferences on students' skill acquisition in the topic of 'Parallel and Perpendicular Lines'.  相似文献   

15.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

16.
The goal of this study was to investigate the usefulness of person‐fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two‐stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person‐fit statistic, the hierarchy consistency index (HCI; Cui, 2007 ; Cui & Leighton, 2009 ), was used to identify the misfitting student item‐score vectors. In the second stage, students’ verbal reports were collected to provide additional information about students’ response processes so as to reveal the actual causes of misfits. This two‐stage procedure helped to identify the misfits of item‐score vectors to the cognitive model used in the design and analysis of the diagnostic test, and to discover the reasons of misfits so that students’ problem‐solving strategies were better understood and their performances were interpreted in a more meaningful way.  相似文献   

17.
This article used the Wald test to evaluate the item‐level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G‐DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A‐CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings.  相似文献   

18.
本文是第一篇探索斯坦福成就阅读考试(第十版)的原本及其客户化版本的结构相似性的文章。研究分析是跨年级在多个观测变量(个别题目,题组,题包)上进行的。分析方法主要包括线性和非线性的探索性和实证性因素分析。分析结果表明在所有文章内的试题,都有不同程度的题组效应。在所有的模型当中,个别题目作为观测变量的模型的拟合度最低,题组作为观测变量的模型的拟合;其次,题包作为观测变量的模型的拟合度最高。在三种结构等性等级:同性等性(congenric),陶性等性(tau-equivalent)和并行等性(parallel)中,斯坦福成就阅读考试原本与其客户化版本的结构具有同性相似。  相似文献   

19.
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.  相似文献   

20.
Many mechanistic rules of thumb for evaluating the goodness of fit of structural equation models (SEM) emphasize model parsimony; all other things being equal, a simpler, more parsimonious model with fewer estimated parameters is better than a more complex model Although this is usually good advice, in the present article a heuristic counterexample is demonstrated in which parsimony as typically operationalized in indices of fit may be undesirable. Specifically, in simplex models of longitudinal data, the failure to include correlated uniquenesses relating the same indicators administered on different occasions will typically lead to systematically inflated estimates of stability. Although simplex models with correlated uniquenesses are substantially less parsimonious and may be unacceptable according to mechanistic decision rules that penalize model complexity, it can be argued a priori that these additional parameter estimates should be included. Simulated data . are used to support this claim and to evaluate the behavior of a variety of fit indices and decision rules. The results demonstrate the validity of Bollen and Long’s (1993) conclusion that “test statistics and fit indices are very beneficial, but they are no replacement for sound judgment and substantive expertise” (p. 8).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号