期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

What is computer-based testing washback,how can it be evaluated and how can this support practitioner research?

《Journal of Further & Higher Education》2012,36(9):1255-1270

ABSTRACT

With the introduction of a new initiative in a teaching and learning environment there is an ethical responsibility to consider whether the impact of the introduction has met its intended goals, and whether it has harmed those who are influenced by it. Technology and infrastructure developments have encouraged a continued growth in the development and introduction of computer-based tests (CBTs) in educational environments. In the educational assessment literature, enquiry into the impact of testing (of all types) is known as ‘washback’. This is a reference to the way in which a test might have a range of influences on learners and teachers prior to the test-taking event. This article reviews the literature on CBT washback and outlines a framework for studying its effects as it is introduced into educational contexts. We then outline a research framework that we have developed (based on the literature) that can be used to evaluate CBT washback. We go on to argue that, to fulfil its potential in supporting the development of change, the research framework needs to act as a mediating device that brings together teaching-practitioner and researcher perspectives. The framework that we propose conceptualises the nature of washback in CBT contexts, as well as the research process and the methods required to understand it. This framework provides an element of common ground between practitioners (i.e. teachers who are involved in a CBT development process) and external researchers, and supports collaboration at three distinct levels. 相似文献

2.

When Are Multidimensional Data Unidimensional Enough for Structural Equation Modeling? An Evaluation of the DETECT Multidimensionality Index

Wes E. Bonifay Richard Scheines Rob R. Meijer 《Structural equation modeling》2013,20(4):504-516

In structural equation modeling (SEM), researchers need to evaluate whether item response data, which are often multidimensional, can be modeled with a unidimensional measurement model without seriously biasing the parameter estimates. This issue is commonly addressed through testing the fit of a unidimensional model specification, a strategy previously determined to be problematic. As an alternative to the use of fit indexes, we considered the utility of a statistical tool that was expressly designed to assess the degree of departure from unidimensionality in a data set. Specifically, we evaluated the ability of the DETECT “essential unidimensionality” index to predict the bias in parameter estimates that results from misspecifying a unidimensional model when the data are multidimensional. We generated multidimensional data from bifactor structures that varied in general factor strength, number of group factors, and items per group factor; a unidimensional measurement model was then fit and parameter bias recorded. Although DETECT index values were generally predictive of parameter bias, in many cases, the degree of bias was small even though DETECT indicated significant multidimensionality. Thus we do not recommend the stand-alone use of DETECT benchmark values to either accept or reject a unidimensional measurement model. However, when DETECT was used in combination with additional indexes of general factor strength and group factor structure, parameter bias was highly predictable. Recommendations for judging the severity of potential model misspecifications in practice are provided. 相似文献

3.

Junjun Chen Gavin T.L. Brown 《Teachers and Teaching》2016,22(3):350-367

This study surveyed 1064 Chinese school teachers’ approaches to teaching and conceptions of assessment, and examined their inter-relationship using confirmatory factor analysis and structural equation modeling. Three approaches to teaching (i.e. Knowledge Transmission, Student-Focused, and Examination Preparation) and six conceptions of assessment (i.e. Student Development, Teaching Improvement, Examination, Control, School Accountability, and Irrelevance) were identified. Teachers indicated they used Student-Focused most frequently and this positively predicted the assessment purposes of Student Development and Teaching Improvement, while loading negatively on Control, School Accountability, and Irrelevance. The Knowledge Transmission teaching approach, in contrast, positively predicted the assessment purposes of Examination, School Accountability, Control, Student Development, and Teaching Improvement. Thus, despite a predominantly student-focused approach to teaching, knowledge transmission was seen as a teaching approach that contributed positively to student learning. Possible explanations for this anomalous result are discussed. 相似文献

4.

Cross-Validation of a Model of Intrinsic Motivation With Students Enrolled in High School Elective Courses

Emilio Ferrer-Caja Maureen R. Weiss 《Journal of Experimental Education》2013,81(1):41-65

The purpose of this study was to cross-validate a model of relationships among social-contextual factors, individual differences, and intrinsic motivation in adolescent students enrolled in required courses (E. Ferrer-Caja & M. R. Weiss, 2000) with an independent sample of students taking elective courses. Female and male high school students (N = 219) completed measures of motivational climate, teaching style, perceived competence, self-determination, goal orientation, and intrinsic motivation. Motivated behavior was assessed by teachers who rated the students on effort and persistence in class activities. First, the authors used structural equation modeling to examine model invariance between the original and the new samples, which yielded a lack of equivalence. Next, the authors examined several alternative theory-based models using the elective sample. The results indicated that the data were best represented by a model that separated social-contextual factors, individual factors, intrinsic motivation, and motivated behaviors. The strongest predictors of intrinsic motivation were task-goal orientation and perceived competence. These results are discussed from both theoretical and methodological perspectives. 相似文献

5.

Carl F. Falk Jeremy C. Biesanz 《Structural equation modeling》2013,20(1):24-38

Although much is known about the performance of recent methods for inference and interval estimation for indirect or mediated effects with observed variables, little is known about their performance in latent variable models. This article presents an extensive Monte Carlo study of 11 different leading or popular methods adapted to structural equation models with latent variables. Manipulated variables included sample size, number of indicators per latent variable, internal consistency per set of indicators, and 16 different path combinations between latent variables. Results indicate that some popular or previously recommended methods, such as the bias-corrected bootstrap and asymptotic standard errors had poorly calibrated Type I error and coverage rates in some conditions. Likelihood-based confidence intervals, the distribution of the product method, and the percentile bootstrap emerged as leading methods for both interval estimation and inference, whereas joint significance tests and the partial posterior method performed well for inference. 相似文献

6.

Taking the Time to Improve the Validity of Low-Stakes Tests: The Effort-Monitoring CBT

Steven L. Wise Dennison S. Bhola Sheng-Ta Yang 《Educational Measurement》2006,25(2):21-30

The attractiveness of computer-based tests (CBTs) is due largely to their capability to expand the ways we conduct testing. A relatively unexplored application, however, is actively using the computer to reduce construct-irrelevant variance while a test is being administered. This investigation introduces the effort-monitoring CBT, in which the computer monitors examinee effort (based on item response time) in a low-stakes test and displays warning messages to those exhibiting rapid-guessing behavior. The results of an experimental study are presented, which showed that an effort-monitoring CBT increased examinee effort and yielded more valid test scores than a conventional CBT. Thus, unlike previous research that has focused on identifying rapid-guessing behavior after it has occurred, the effort-monitoring CBT proactively attempts to suppress rapid-guessing behavior. This innovative testing procedure extends the capabilities of measurement practitioners to manage the psychometric challenges posed by unmotivated examinees. 相似文献

7.

A Validity Framework for Evaluating the Technical Quality of Alternate Assessments

Scott F. Marion James W. Pellegrino 《Educational Measurement》2006,25(4):47-57

This article presents findings from two projects designed to improve evaluations of technical quality of alternate assessments for students with the most significant cognitive disabilities. We argue that assessment technical documents should allow for the evaluation of the construct validity of the alternate assessments following the traditions of Cronbach (1971) , Messick (1989, 1995) , Linn, Baker, and Dunbar (1991) , and Shepard (1993) . The projects used the work of Knowing What Students Know ( Pellegrino, Chudowsky, & Glaser, 2001 ) to structure and focus the collection and evaluation of assessment information. The heuristic of the assessment triangle ( Pellegrino et al., 2001 ) was particularly useful in emphasizing that the validity evaluation needs to consider the logical connections among the characteristics of the students tested and how they develop domain proficiency (the cognition vertex), the nature of the assessment (the observation vertex), and the ways in which the assessment results are interpreted (the interpretation vertex). This project has shown that in addition to designing more valid assessments, the growing body of knowledge about the psychology of achievement testing can be useful for structuring evaluations of technical quality. 相似文献

8.

Sarah Depaoli James P. Clifton 《Structural equation modeling》2013,20(3):327-351

Multilevel Structural equation models are most often estimated from a frequentist framework via maximum likelihood. However, as shown in this article, frequentist results are not always accurate. Alternatively, one can apply a Bayesian approach using Markov chain Monte Carlo estimation methods. This simulation study compared estimation quality using Bayesian and frequentist approaches in the context of a multilevel latent covariate model. Continuous and dichotomous variables were examined because it is not yet known how different types of outcomes—most notably categorical—affect parameter recovery in this modeling context. Within the Bayesian estimation framework, the impact of diffuse, weakly informative, and informative prior distributions were compared. Findings indicated that Bayesian estimation may be used to overcome convergence problems and improve parameter estimate bias. Results highlight the differences in estimation quality between dichotomous and continuous variable models and the importance of prior distribution choice for cluster-level random effects. 相似文献

9.

Robert B. Powell Marc J. Stern Brandon Troy Frensley DeWayne Moore 《Environmental Education Research》2019,25(9):1281-1299

Abstract

While multiple valid measures exist for assessing outcomes of environmental education (EE) programs, the field lacks a comprehensive and logistically feasible common instrument that can apply across diverse programs. We describe a participatory effort for identifying and developing crosscutting outcomes for Environmental Education in the twenty-first Century (EE21). Following extensive input and debate from a wide range of EE providers and researchers, we developed, tested and statistically validated crosscutting scales for measuring consensus-based outcomes for individual participants in youth EE programs using confirmatory factor analysis across six unique sites, including two single-day field trip locations, four multiday residential programs and one science museum in the United States. The results suggest that the scales are valid and reliable for measuring outcomes that many EE programs in the United States can aspire to influence in adolescent participants, ages 10–14. 相似文献

10.

Michael R. Vitale Nancy R. Romance 《Journal of Science Education and Technology》1995,4(1):65-74

The theme of this article is that the development of informed teacher advocacy for new advancements in technology-based assessment is an essential requirement if such advancements are to contribute toward the systemic improvement of the quality of school science instruction. The potential for advocacy involvement by teachers is considered a natural reaction toward the increasing tendency for classroom practices to be affected by local, state, or national assessment policy initiatives. In support of such an advocacy process, this article provides an awareness of the principles of good measurement practices in conjunction with the qualitative characteristics of technology-based assessment that together are sufficient to serve as a foundation for teachers whose concerns may motivate them to raise relevant questions regarding assessment policy. Based upon such implied standards of testing practice, the article suggests key evaluative questions for teachers to ask about any forms of science assessment that would have the effect of amplifying the potential value of new technology-based forms of assessment applications to enhance ongoing classroom processes of science teaching. 相似文献

11.

Bengt Muthén Tihomir Asparouhov 《Structural equation modeling》2013,20(1):12-23

Causal inference in mediation analysis offers counterfactually based causal definitions of direct and indirect effects, drawing on research by Robins, Greenland, Pearl, VanderWeele, Vansteelandt, Imai, and others. This type of mediation effect estimation is little known and seldom used among analysts using structural equation modeling (SEM). The aim of this article is to describe the new analysis opportunities in a way that is accessible to SEM analysts and show examples of how to perform the analyses. An application is presented with an extension to a latent mediator measured with multiple indicators. 相似文献

12.

Yan Wang Eun Sook Kim 《Structural equation modeling》2017,24(5):699-713

Appropriate model specification is fundamental to unbiased parameter estimates and accurate model interpretations in structural equation modeling. Thus detecting potential model misspecification has drawn the attention of many researchers. This simulation study evaluates the efficacy of the Bayesian approach (the posterior predictive checking, or PPC procedure) under multilevel bifactor model misspecification (i.e., ignoring a specific factor at the within level). The impact of model misspecification on structural coefficients was also examined in terms of bias and power. Results showed that the PPC procedure performed better in detecting multilevel bifactor model misspecification, when the misspecification became more severe and sample size was larger. Structural coefficients were increasingly negatively biased at the within level, as model misspecification became more severe. Model misspecification at the within level affected the between-level structural coefficient estimates more when data dependency was lower and the number of clusters was smaller. Implications for researchers are discussed. 相似文献

13.

The Model Size Effect in SEM: Inflated Goodness-of-Fit Statistics Are Due to the Size of the Covariance Matrix

Morten Moshagen 《Structural equation modeling》2013,20(1):86-98

The size of a model has been shown to critically affect the goodness of approximation of the model fit statistic T to the asymptotic chi-square distribution in finite samples. It is not clear, however, whether this “model size effect” is a function of the number of manifest variables, the number of free parameters, or both. It is demonstrated by means of 2 Monte Carlo computer simulation studies that neither the number of free parameters to be estimated nor the model degrees of freedom systematically affect the T statistic when the number of manifest variables is held constant. Increasing the number of manifest variables, however, is associated with a severe bias. These results imply that model fit drastically depends on the size of the covariance matrix and that future studies involving goodness-of-fit statistics should always consider the number of manifest variables, but can safely neglect the influence of particular model specifications. 相似文献

14.

Alternative Methods for Assessing Mediation in Multilevel Data: The Advantages of Multilevel SEM

Kristopher J. Preacher Zhen Zhang Michael J. Zyphur 《Structural equation modeling》2013,20(2):161-182

Multilevel modeling (MLM) is a popular way of assessing mediation effects with clustered data. Two important limitations of this approach have been identified in prior research and a theoretical rationale has been provided for why multilevel structural equation modeling (MSEM) should be preferred. However, to date, no empirical evidence of MSEM's advantages relative to MLM approaches for multilevel mediation analysis has been provided. Nor has it been demonstrated that MSEM performs adequately for mediation analysis in an absolute sense. This study addresses these gaps and finds that the MSEM method outperforms 2 MLM-based techniques in 2-level models in terms of bias and confidence interval coverage while displaying adequate efficiency, convergence rates, and power under a variety of conditions. Simulation results support prior theoretical work regarding the advantages of MSEM over MLM for mediation in clustered data. 相似文献

15.

Goran Pavlov Alberto Maydeu-Olivares Dexin Shi 《Educational and psychological measurement》2021,81(1):110

We examine the accuracy of p values obtained using the asymptotic mean and variance (MV) correction to the distribution of the sample standardized root mean squared residual (SRMR) proposed by Maydeu-Olivares to assess the exact fit of SEM models. In a simulation study, we found that under normality, the MV-corrected SRMR statistic provides reasonably accurate Type I errors even in small samples and for large models, clearly outperforming the current standard, that is, the likelihood ratio (LR) test. When data shows excess kurtosis, MV-corrected SRMR p values are only accurate in small models (p = 10), or in medium-sized models (p = 30) if no skewness is present and sample sizes are at least 500. Overall, when data are not normal, the MV-corrected LR test seems to outperform the MV-corrected SRMR. We elaborate on these findings by showing that the asymptotic approximation to the mean of the SRMR sampling distribution is quite accurate, while the asymptotic approximation to the standard deviation is not. 相似文献

16.

Stephanie T. Lane Kathleen M. Gates 《Structural equation modeling》2017,24(5):768-782

In order to analyze intensive longitudinal data collected across multiple individuals, researchers frequently have to decide between aggregating all individuals or analyzing each individual separately. This paper presents an R package, gimme, which allows for the automatic specification of individual-level structural equation models that combine group-, subgroup-, and individual-level information. This R package is a complement of the GIMME program currently available via a combination of MATLAB and LISREL. By capitalizing on the flexibility of R and the capabilities of the existing structural equation modeling package lavaan, gimme allows for the automated specification and estimation of group-, subgroup-, and individual-level relations in time series data from within a structural equation modeling framework. Applications include daily diary data as well as functional magnetic resonance imaging data. 相似文献

17.

Computer-aided authoring of assessment instruments: An activity-theoretical approach

《Africa Education Review》2013,10(2):245-258

Abstract

Various researches have been conducted on the role and importance of assessment in education as well as its impact on the learner and the overall learning process. In fact, the way assessment is formulated in a particular subject shapes the way students learn. They focus their learning to comply with assessment requirements that they anticipate. In this article, the study is focused on the written examination papers (teacher-made tests) that are normally prepared at the end of a semester or an academic year to assess students of secondary and tertiary levels. The study also investigates how well papers are set and balanced according to the cognitive levels defined by Bloom (1956) and the learning outcomes/objectives as defined for the subjects. A collaborative process model as a framework for the design of such tests that can enhance the evaluation process is proposed. A brief argument is made for a case for a computer-supported collaborative environment to implement such a framework and which is based on activity theory. Such a framework is implemented in the form of MYSTIC; a collaborative authoring software for assessment instruments. The software allows stand-alone as well as collaborative authoring of examination papers and also helps academics' decision-making concerning the examination paper balancing and moderating process by graphically displaying and comparing marks allocated per question paper against the learning objectives 相似文献

18.

A Comparison of CFA,ESEM, and BSEM in Test Structure Analysis

Yue Xiao Kit-Tai Hau 《Structural equation modeling》2013,20(5):665-677

Minor cross-loadings on non-targeted factors are often found in psychological or other instruments. Forcing them to zero in confirmatory factor analyses (CFA) leads to biased estimates and distorted structures. Alternatively, exploratory structural equation modeling (ESEM) and Bayesian structural equation modeling (BSEM) have been proposed. In this research, we compared the performance of the traditional independent-clusters-confirmatory-factor-analysis (ICM-CFA), the nonstandard CFA, ESEM with the Geomin- or Target-rotations, and BSEMs with different cross-loading priors (correct; small- or large-variance priors with zero mean) using simulated data with cross-loadings. Four factors were considered: the number of factors, the size of factor correlations, the cross-loading mean, and the loading variance. Results indicated that ICM-CFA performed the worst. ESEMs were generally superior to CFAs but inferior to BSEM with correct priors that provided the precise estimation. BSEM with large- or small-variance priors performed similarly while the prior mean for cross-loadings was more important than the prior variance. 相似文献

19.

Ehri Ryu Paras Mehta 《Structural equation modeling》2017,24(6):936-959

We present a multigroup multilevel confirmatory factor analysis (CFA) model and a procedure for testing multilevel factorial invariance in n-level structural equation modeling (nSEM). Multigroup multilevel CFA introduces a complexity when the group membership at the lower level intersects the clustered structure, because the observations in different groups but in the same cluster are not independent of one another. nSEM provides a framework in which the multigroup multilevel data structure is represented with the dependency between groups at the lower level properly taken into account. The procedure for testing multilevel factorial invariance is illustrated with an empirical example using an R package xxm2. 相似文献

20.

The Reliability of Observations of Talkativeness and Social Contact among Nursery School Children by the “Short Time Sample” Technique

Esther W. Robinson Herbert S. Conrad 《Journal of Experimental Education》2013,81(2):161-165

In 1958, Page conducted a large multiple experiment: 74 teachers gave one class its normal quiz, scored and graded it in the usual way, assigned three comment treatments to students in stratified-random blocks, and then reported scores from the next objective quiz. There was a highly significant effect of comments. Others have borrowed some study features, with results that have appeared mixed. Here, a critical overall analysis shows much agreement with the ordered hypothesis of comments and with specified comments over no comments (p < .01). Despite great variety of designs and subtlety of effect, results broadly support teachers who comment. A typical effect size is demonstrated for ranks, and lessons are taken about the proper strategies for designs and the future of such research. 相似文献