期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

It Matters: Reference Indicator Selection in Measurement Invariance Tests

Yutian T. Thompson Hairong Song Dexin Shi Zhengkui Liu 《Educational and psychological measurement》2021,81(1):5

Conventional approaches for selecting a reference indicator (RI) could lead to misleading results in testing for measurement invariance (MI). Several newer quantitative methods have been available for more rigorous RI selection. However, it is still unknown how well these methods perform in terms of correctly identifying a truly invariant item to be an RI. Thus, Study 1 was designed to address this issue in various conditions using simulated data. As a follow-up, Study 2 further investigated the advantages/disadvantages of using RI-based approaches for MI testing in comparison with non-RI-based approaches. Altogether, the two studies provided a solid examination on how RI matters in MI tests. In addition, a large sample of real-world data was used to empirically compare the uses of the RI selection methods as well as the RI-based and non-RI-based approaches for MI testing. In the end, we offered a discussion on all these methods, followed by suggestions and recommendations for applied researchers. 相似文献

2.

Assessing Differential Step Functioning in Polytomous Items Using a Common Odds Ratio Estimator

Randall D. Penfield 《Journal of Educational Measurement》2007,44(3):187-210

Many statistics used in the assessment of differential item functioning (DIF) in polytomous items yield a single item-level index of measurement invariance that collapses information across all response options of the polytomous item. Utilizing a single item-level index of DIF can, however, be misleading if the magnitude or direction of the DIF changes across the steps underlying the polytomous response process. A more comprehensive approach to examining measurement invariance in polytomous item formats is to examine invariance at the level of each step of the polytomous item, a framework described in this article as differential step functioning (DSF). This article proposes a nonparametric DSF estimator that is based on the Mantel-Haenszel common odds ratio estimator ( Mantel & Haenszel, 1959 ), which is frequently implemented in the detection of DIF in dichotomous items. A simulation study demonstrated that when the level of DSF varied in magnitude or sign across the steps underlying the polytomous response options, the DSF-based approach typically provided a more powerful and accurate test of measurement invariance than did corresponding item-level DIF estimators. 相似文献

3.

A Monte Carlo Simulation Study to Assess The Appropriateness of Traditional and Newer Approaches to Test for Measurement Invariance

Artur Pokropek Eldad Davidov Peter Schmidt 《Structural equation modeling》2013,20(5):724-744

Several structural equation modeling (SEM) strategies were developed for assessing measurement invariance (MI) across groups relaxing the assumptions of strict MI to partial, approximate, and partial approximate MI. Nonetheless, applied researchers still do not know if and under what conditions these strategies might provide results that allow for valid comparisons across groups in large-scale comparative surveys. We perform a comprehensive Monte Carlo simulation study to assess the conditions under which various SEM methods are appropriate to estimate latent means and path coefficients and their differences across groups. We find that while SEM path coefficients are relatively robust to violations of full MI and can be rather effectively recovered, recovering latent means and their group rankings might be difficult. Our results suggest that, contrary to some previous recommendations, partial invariance may rather effectively recover both path coefficients and latent means even when the majority of items are noninvariant. Although it is more difficult to recover latent means using approximate and partial approximate MI methods, it is possible under specific conditions and using appropriate models. These models also have the advantage of providing accurate standard errors. Alignment is recommended for recovering latent means in cases where there are only a few noninvariant parameters across groups. 相似文献

4.

The Invariance Paradox: Using Optimal Test Design to Minimize Bias

Andrew T. Jones Jason P. Kopp Thai Q. Ong 《Educational Measurement》2020,39(2):48-57

Studies investigating invariance have often been limited to measurement or prediction invariance. Selection invariance, wherein the use of test scores for classification results in equivalent classification accuracy between groups, has received comparatively little attention in the psychometric literature. Previous research suggests that some form of selection bias (lack of selection invariance) will exist in most testing contexts, where classification decisions are made, even when meeting the conditions of measurement invariance. We define this conflict between measurement and selection invariance as the invariance paradox. Previous research has found test reliability to be an important factor in minimizing selection bias. This study demonstrates that the location of maximum test information may be a more important factor than overall test reliability in minimizing decision errors between groups. 相似文献

5.

Retroactive inhibition in rat spatial memory

William A. Roberts 《Learning & behavior》1981,9(4):566-574

Two experiments were carried out in which rats first were given four forced choices on an eight-arm radial maze, then were given interpolated maze experiences, and finally were given a free choice retention test on the first maze. In Experiment 1, interpolated experiences consisted of forced choices made on one, two, or three other mazes, each placed in a different room. Retroactive inhibition (RI) was not found with one and two interpolated mazes but was found with three interpolated mazes. In Experiments 2a and 2b, an attempt was made to produce RI within a single context by using two mazes placed side by side or on top of one another and by using interpolated forced choices that were different, random, or the same with respect to forced choices onMaze 1. These conditions failed to yield any evidence of RI. In Experiment 2c, forced choices were followed by interpolated direct placements on the same maze on different, random, or the same maze arms, and retention tests revealed RI under these conditions. It was concluded that rats encode memories of specific places visited in space and that RI will arise only if (1) memory is greatly overloaded with interpolated information or (2) an interpolated visit is made to exactly that position in space to which an animal must travel in order to achieve a correct choice on the retention test. 相似文献

6.

An Odds Ratio Approach for Assessing Differential Distractor Functioning Effects under the Nominal Response Model

Randall D. Penfield 《Journal of Educational Measurement》2008,45(3):247-269

Investigations of differential distractor functioning (DDF) can provide valuable information concerning the location and possible causes of measurement invariance within a multiple‐choice item. In this article, I propose an odds ratio estimator of the DDF effect as modeled under the nominal response model. In addition, I propose a simultaneous distractor‐level (SDL) test of invariance based on the results of the distractor‐level tests of DDF. The results of a simulation study indicated that the DDF effect estimator maintained good statistical properties under a variety of conditions, and the SDL test displayed substantially higher power than the traditional Mantel‐Haenszel test of no DIF when the DDF effect varied in magnitude and/or size across the distractors. 相似文献

7.

Evaluation of Structural Relationships in Autoregressive Cross-Lagged Models Under Longitudinal Approximate Invariance:A Bayesian Analysis

Xinya Liang Yanyun Yang Jiajing Huang 《Structural equation modeling》2018,25(4):558-572

To infer longitudinal relationships among latent factors, traditional analyses assume that the measurement model is invariant across measurement occasions. Alternative to placing cross-occasion equality constraints on parameters, approximate measurement invariance (MI) can be analyzed by specifying informative priors on parameter differences between occasions. This study evaluated the estimation of structural coefficients in multiple-indicator autoregressive cross-lagged models under various conditions of approximate MI using Bayesian structural equation modeling. Design factors included factor structures, conditions of non-invariance, sizes of structural coefficients, and sample sizes. Models were analyzed using two sets of small-variance priors on select model parameters. Results showed that autoregressive coefficient estimates were more accurate for the mixed pattern than the decreasing pattern of non-invariance. When a model included cross-loadings, an interaction was found between the cross-lagged estimates and the non-invariance conditions. Implications of findings and future research directions are discussed. 相似文献

8.

Understanding the Impact of Partial Factorial Invariance on Selection Accuracy: An R Script

Mark H. C. Lai Oi-man Kwok Myeongsun Yoon Yu-Yu Hsiao 《Structural equation modeling》2017,24(5):783-799

相似文献

9.

Two-Step Approach to Partial Factorial Invariance: Selecting a Reference Variable and Identifying the Source of Noninvariance

Eunju Jung Myeongsun Yoon 《Structural equation modeling》2017,24(1):65-79

To date, no effective empirical method has been available to identify a truly invariant reference variable (RV) in testing measurement invariance under a multiple-group confirmatory factor analysis. This study proposes a method that, in selecting an RV, uses the smallest modification index (min-mod). The method’s performance is evaluated using 2 models: (a) a full invariance model, and (b) a partial invariance model. Results indicate that for both models the min-mod successfully identifies a truly invariant RV (Study 1). In Study 2, we use the RV found in Study 1 to further evaluate the performance of item-by-item Wald tests at locating a noninvariant variable. The results indicate that Wald tests overall performed better with an RV selected in a partial invariance model than an RV selected in a full invariance model, although in certain conditions their performances were rather similar. Implications and limitations of the study are also discussed. 相似文献

10.

Comparisons of Three Empirical Methods for Partial Factorial Invariance: Forward,Backward, and Factor-Ratio Tests

Eunju Jung Myeongsun Yoon 《Structural equation modeling》2016,23(4):567-584

When factorial invariance is violated, a possible first step in locating the source of violation(s) might be to pursue partial factorial invariance (PFI). Two commonly used methods for PFI are sequential use of the modification index (backward MI method) and the factor-ratio test. In this study, we propose a simple forward method using the confidence interval (forward CI method). We compare the performances of the aforementioned 3 methods under various simulated PFI conditions. Results indicate that the forward CI method using 99% CIs has the highest perfect recovery rates and the lowest Type I error rates. A performance that is competitive with this is that produced by the backward method with the more conservative criterion (MI = 6.635). Consistently delivering the poorest performance, regardless of the chosen confidence level, was the factor-ratio test. Also discussed are the work’s contribution, implications, and limitations. 相似文献

11.

Sensitivity of the RMSD for Detecting Item-Level Misfit in Low-Performing Countries

Jesper Tijmstra Maria Bolsinova Yuan-Ling Liaw Leslie Rutkowski David Rutkowski 《Journal of Educational Measurement》2020,57(4):566-583

Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered countries. Specifically, items for which most respondents in a country have a very low (or high) probability of providing a correct answer will rarely be flagged by the RMSD as showing misfit, even if very strong DIF is present. With many international large-scale assessment initiatives moving toward covering a more heterogeneous group of countries, this raises issues for the ability of the RMSD to detect item-level misfit, especially in low-performing countries that are not well-aligned with the overall difficulty level of the test. This may put one at risk of incorrectly assuming measurement invariance to hold, and may also inflate estimated between-country difference in proficiency. The degree to which the RMSD is able to detect DIF in low-performing countries is studied using both an empirical example from PISA 2015 and a simulation study. 相似文献

12.

An Investigation of the Alignment Method With Polytomous Indicators Under Conditions of Partial Measurement Invariance

Jessica K. Flake D. Betsy McCoach 《Structural equation modeling》2018,25(1):56-70

The alignment method (Asparouhov & Muthén, 2014) is an alternative to multiple-group factor analysis for estimating measurement models and testing for measurement invariance across groups. Simulation studies evaluating the performance of the alignment for estimating measurement models across groups show promising results for continuous indicators. This simulation study builds on previous research by investigating the performance of the alignment method’s measurement models estimates with polytomous indicators under conditions of systematically increasing, partial measurement invariance. We also present an evaluation of the testing procedure, which has not been the focus of previous simulation studies. Results indicate that the alignment adequately recovers parameter estimates under small and moderate amounts of noninvariance, with issues only arising in extreme conditions. In addition, the statistical tests of invariance were fairly conservative, and had less power for items with more extreme skew. We include recommendations for using the alignment method based on these results. 相似文献

13.

The Impact of Varying the Number of Measurement Invariance Constraints on the Assessment of Between-Group Differences of Latent Means

Yuning Xu Samuel B. Green 《Structural equation modeling》2016,23(2):290-301

The objective was to offer guidelines for applied researchers on how to weigh the consequences of errors made in evaluating measurement invariance (MI) on the assessment of factor mean differences. We conducted a simulation study to supplement the MI literature by focusing on choosing among analysis models with different number of between-group constraints imposed on loadings and intercepts of indicators. Data were generated with varying proportions, patterns, and magnitudes of differences in loadings and intercepts as well as factor mean differences and sample size. Based on the findings, we concluded that researchers who conduct MI analyses should recognize that relaxing as well as imposing constraints can affect Type I error rate, power, and bias of estimates in factor mean differences. In addition, fit indexes can be misleading in making decisions about constraints of loadings and intercepts. We offer suggestions for making MI decisions under uncertainty when assessing factor mean differences. 相似文献

14.

Measurement Invariance Testing with Many Groups: A Comparison of Five Approaches

Eun Sook Kim Chunhua Cao Yan Wang Diep T. Nguyen 《Structural equation modeling》2017,24(4):524-544

With the increasing use of international survey data especially in cross-cultural and multinational studies, establishing measurement invariance (MI) across a large number of groups in a study is essential. Testing MI over many groups is methodologically challenging, however. We identified 5 methods for MI testing across many groups (multiple group confirmatory factor analysis, multilevel confirmatory factor analysis, multilevel factor mixture modeling, Bayesian approximate MI testing, and alignment optimization) and explicated the similarities and differences of these approaches in terms of their conceptual models and statistical procedures. A Monte Carlo study was conducted to investigate the efficacy of the 5 methods in detecting measurement noninvariance across many groups using various fit criteria. Generally, the 5 methods showed reasonable performance in identifying the level of invariance if an appropriate fit criterion was used (e.g., Bayesian information criteron with multilevel factor mixture modeling). Finally, general guidelines in selecting an appropriate method are provided. 相似文献

15.

People who cheat on tests accurately predict their performance on future tests

《Learning and Instruction》2020

Studies suggest that people who cheat on a test overestimate their performance on future tests. Given that erroneous monitoring of one's own cognitive processes impairs learning and memory, this study investigated whether cheating on a test would harm monitoring accuracy on future tests. Participants had the incentive and opportunity to cheat on one (Experiments 1, 2, and 3, with N = 90, 88, and 102, respectively) or two (Experiment 4, N = 214) of four general-knowledge tests. Cheating produced overconfidence in global-level performance predictions in Experiment 2 (Cohen's d ≥ 0.35) but not in Experiments 1 or 4. Also, cheating did not affect the absolute or relative accuracy of item-level performance predictions in Experiments 3 or 4. A Bayesian meta-analysis of all experiments provided evidence against cheating-induced overconfidence in global- and item-level predictions. Overall, our results demonstrate that people who cheat on tests accurately predict their performance on future tests. 相似文献

16.

Testing Factorial Invariance With Unbalanced Samples

Myeongsun Yoon Mark H. C. Lai 《Structural equation modeling》2018,25(2):201-213

In testing the factorial invariance of a measure across groups, the groups are often of different sizes. Large imbalances in group size might affect the results of factorial invariance studies and lead to incorrect conclusions of invariance because the fit function in multiple-group factor analysis includes a weighting by group sample size. The implication is that violations of invariance might not be detected if the sample sizes of the 2 groups are severely unbalanced. In this study, we examined the effects of group size differences on results of factorial invariance tests, proposed a subsampling method to address unbalanced sample size issue in factorial invariance studies, and evaluated the proposed approach in various simulation conditions. Our findings confirm that violations of invariance might be masked in the case of severely unbalanced group size conditions and support the use of the proposed subsampling method to obtain accurate results for invariance studies. 相似文献

17.

An NCME Instructional Module on Multistage Testing

Amy Hendrickson 《Educational Measurement》2007,26(2):44-52

Multistage tests are those in which sets of items are administered adaptively and are scored as a unit. These tests have all of the advantages of adaptive testing, with more efficient and precise measurement across the proficiency scale as well as time savings, without many of the disadvantages of an item-level adaptive test. As a seemingly balanced compromise between linear paper-and-pencil and item-level adaptive tests, development and use of multistage tests is increasing. This module describes multistage tests, including two-stage and testlet-based tests, and discusses the relative advantages and disadvantages of multistage testing as well as considerations and steps in creating such tests. 相似文献

18.

Retroactive interference in rat radial maze performance: The role of point of delay interpolation and the similarity and amount of interpolated material

Robert G. Cook Michael F. Brown 《Learning & behavior》1985,13(2):116-120

The conditions necessary for producing retroactive interference (RI) were examined in a 12-arm radial maze. Rats were first given either three or nine forced choices in a to-be-remembered maze. During a 2-h delay, they received one or two trials in a second 12-arm maze, located either in a different room or the same room as the to-be-remembered maze. During the postdelay memory test, RI from the interference trials was produced only when nine choices had been made in the to-be-remembered maze and two interference trials had been conducted during the delay interval. RI was not found when only three forced choices had to be retained or after a single interference trial. The similarity between the interpolated and to-be-remembered mazes had no effect on choice accuracy. It was concluded that two conditions are required for the production of RI in the radial maze. First, a “large amount” of information should be resident in working memory. Second, a substantial number of interpolated trials or choices must be made during the delay. 相似文献

19.

Evaluating Model Fit With Ordered Categorical Data Within a Measurement Invariance Framework: A Comparison of Estimators

Daniel A. Sass Thomas A. Schmitt Herbert W. Marsh 《Structural equation modeling》2013,20(2):167-180

A paucity of research has compared estimation methods within a measurement invariance (MI) framework and determined if research conclusions using normal-theory maximum likelihood (ML) generalizes to the robust ML (MLR) and weighted least squares means and variance adjusted (WLSMV) estimators. Using ordered categorical data, this simulation study aimed to address these queries by investigating 342 conditions. When testing for metric and scalar invariance, Δχ² results revealed that Type I error rates varied across estimators (ML, MLR, and WLSMV) with symmetric and asymmetric data. The Δχ² power varied substantially based on the estimator selected, type of noninvariant indicator, number of noninvariant indicators, and sample size. Although some the changes in approximate fit indexes (ΔAFI) are relatively sample size independent, researchers who use the ΔAFI with WLSMV should use caution, as these statistics do not perform well with misspecified models. As a supplemental analysis, our results evaluate and suggest cutoff values based on previous research. 相似文献

20.

Sensitivity of Linkings Between AP Multiple-Choice Scores and Composite Scores to Geographical Region: An Illustration of Checking for Population Invariance

Wen-Ling Yang 《Journal of Educational Measurement》2004,41(1):33-41

This application study investigates whether the multiple‐choice to composite linking functions that determine Advanced Placement Program exam grades remain invariant over subgroups defined by region. Three years of test data from an AP exam are used to study invariance across regions. The study focuses on two questions: (a) How invariant are grade thresholds across regions? and (b) Do the small sample sizes for some regional groups present particular problems for assessing thresholds invariance? The equatability index proposed by Dorans and Holland (2000) is employed to evaluate the invariance of the linking functions, and cross‐classification is used to evaluate the invariance of the composite cut scores. Overall, the linkings across regions seem to hold up reasonably well. Nevertheless, more exams need to be examined. 相似文献