首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mixture Rasch models have been used to study a number of psychometric issues such as goodness of fit, response strategy differences, strategy shifts, and multidimensionality. Although these models offer the potential for improving understanding of the latent variables being measured, under some conditions overextraction of latent classes may occur, potentially leading to misinterpretation of results. In this study, a mixture Rasch model was applied to data from a statewide test that was initially calibrated to conform to a 3‐parameter logistic (3PL) model. Results suggested how latent classes could be explained and also suggested that these latent classes might be due to applying a mixture Rasch model to 3PL data. To support this latter conjecture, a simulation study was presented to demonstrate how data generated to fit a one‐class 2‐parameter logistic (2PL) model required more than one class when fit with a mixture Rasch model.  相似文献   

2.
Using item response theory, this study explores whether student survey and classroom observation items can be calibrated onto a common metric of teaching quality. The data comprises 269 lessons of 141 teachers that were scored on the International Comparative Analysis of Learning and Teaching (ICALT) observation instrument and the My Teacher student survey. Using Rasch model concurrent calibration, items from both instruments were calibrated onto a common one‐dimensional metric of teaching quality. Most items were found to fit the model. Challenges pertain mainly to items measuring teaching students learning strategies and differentiation. Explanations for these difficulties are discussed.  相似文献   

3.
Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait (θ) estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to calibrate items using the incomplete data from MST design. Further complication arises when there are multiple correlated subscales per test, and when items from different subscales need to be calibrated according to their respective score reporting metric. The current calibration-per-subscale method produced biased item parameters, and there is no available method for resolving the challenge. Deriving from the missing data principle, we showed when calibrating all items together the Rubin's ignorability assumption is satisfied such that the traditional single-group calibration is sufficient. When calibrating items per subscale, we proposed a simple modification to the current calibration-per-subscale method that helps reinstate the missing-at-random assumption and therefore corrects for the estimation bias that is otherwise existent. Three mainstream calibration methods are discussed in the context of MST, they are the marginal maximum likelihood estimation, the expectation maximization method, and the fixed parameter calibration. An extensive simulation study is conducted and a real data example from NAEP is analyzed to provide convincing empirical evidence.  相似文献   

4.
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end‐of‐test items (i.e., speeded items). This article conducted a systematic comparison of five‐item calibration procedures—a two‐parameter logistic (2PL) model, a one‐dimensional mixture model, a two‐step strategy (a combination of the one‐dimensional mixture and the 2PL), a two‐dimensional mixture model, and a hybrid model‐–by examining how sample size, percentage of speeded examinees, percentage of missing responses, and way of scoring missing responses (incorrect vs. omitted) affect the item parameter estimation in speeded tests. For nonspeeded items, all five procedures showed similar results in recovering item parameters. For speeded items, the one‐dimensional mixture model, the two‐step strategy, and the two‐dimensional mixture model provided largely similar results and performed better than the 2PL model and the hybrid model in calibrating slope parameters. However, those three procedures performed similarly to the hybrid model in estimating intercept parameters. As expected, the 2PL model did not appear to be as accurate as the other models in recovering item parameters, especially when there were large numbers of examinees showing speededness and a high percentage of missing responses with incorrect scoring. Real data analysis further described the similarities and differences between the five procedures.  相似文献   

5.
Examined in this study were the effects of reducing anchor test length on student proficiency rates for 12 multiple‐choice tests administered in an annual, large‐scale, high‐stakes assessment. The anchor tests contained 15 items, 10 items, or five items. Five content representative samples of items were drawn at each anchor test length from a small universe of items in order to investigate the stability of equating results over anchor test samples. The operational tests were calibrated using the one‐parameter model and equated using the mean b‐value method. The findings indicated that student proficiency rates could display important variability over anchor test samples when 15 anchor items were used. Notable increases in this variability were found for some tests when shorter anchor tests were used. For these tests, some of the anchor items had parameters that changed somewhat in relative difficulty from one year to the next. It is recommended that anchor sets with more than 15 items be used to mitigate the instability in equating results due to anchor item sampling. Also, the optimal allocation method of stratified sampling should be evaluated as one means of improving the stability and precision of equating results.  相似文献   

6.
The proposed development of extended schools in England is part of an international movement towards community‐oriented schooling, particularly in areas of disadvantage. Although on the face of it this movement seems like a common‐sense approach to self‐evident needs, the evaluation evidence on such schools is inconclusive. In order to assess the likelihood that community‐oriented schooling will have a significant impact on disadvantage, therefore, this paper analyses the rationale on which this approach to schooling appears to be based. It argues that community‐oriented schools as currently conceptualised have a focus on ‘proximal’ rather than ‘distal’ factors in disadvantage, underpinned by a model of social in/exclusion which draws attention away from underlying causes. They are, therefore, likely to have only small‐scale, local impacts. The paper suggests that a more wide‐ranging strategy is needed in which educational reform is linked to other forms of social and economic reform and considers the conditions which would be necessary for the emergence of such a strategy.  相似文献   

7.
In equating, when common items are internal and scoring is conducted in terms of the number of correct items, some pairs of total scores (X) and common‐item scores (V) can never be observed in a bivariate distribution of X and V; these pairs are called structural zeros. This simulation study examines how equating results compare for different approaches to handling structural zeros. The study considers four approaches: the no‐smoothing, unique‐common, total‐common, and adjusted total‐common approaches. This study led to four main findings: (1) the total‐common approach generally had the worst results; (2) for relatively small effect sizes, the unique‐common approach generally had the smallest overall error; (3) for relatively large effect sizes, the adjusted total‐common approach generally had the smallest overall error; and, (4) if sole interest focuses on reducing bias only, the adjusted total‐common approach was generally preferable. These results suggest that, when common items are internal and log‐linear bivariate presmoothing is performed, structural zeros should be maintained, even if there is some loss in the moment preservation property.  相似文献   

8.
This study compared diagonal weighted least squares robust estimation techniques available in 2 popular statistical programs: diagonal weighted least squares (DWLS; LISREL version 8.80) and weighted least squares–mean (WLSM) and weighted least squares—mean and variance adjusted (WLSMV; Mplus version 6.11). A 20-item confirmatory factor analysis was estimated using item-level ordered categorical data. Three different nonnormality conditions were applied to 2- to 7-category data with sample sizes of 200, 400, and 800. Convergence problems were seen with nonnormal data when DWLS was used with few categories. Both DWLS and WLSMV produced accurate parameter estimates; however, bias in standard errors of parameter estimates was extreme for select conditions when nonnormal data were present. The robust estimators generally reported acceptable model–data fit, unless few categories were used with nonnormal data at smaller sample sizes; WLSMV yielded better fit than WLSM for most indices.  相似文献   

9.
In this article 2 major problems of using the three‐wave quasi simplex model to obtain reliability estimates are illustrated. The 1st problem is that the sampling variance of the reliability estimates can be very large, especially if the stability through time is low. The 2nd problem is that, for the reliability parameter to be identified, the model assumes a particular change process, namely a Markov process. We show that minor violations of this assumption can lead to a large bias in the reliability estimates. The problems are evaluated using both real and Monte Carlo data. A model with repeated measurements in 1 of the waves is also discussed.  相似文献   

10.
This article draws upon the literature showing the benefits of high‐quality preschools on child well‐being to explore the role of household income on preschool attendance for a cohort of 3‐ to 6‐year‐olds in China using data from the China Health and Nutrition Survey, 1991–2006. Analyses are conducted separately for rural (N = 1,791) and urban (N = 633) settings. Estimates from a probit model with rich controls suggest a positive association between household income per capita and preschool attendance in both settings. A household fixed‐effects model, conducted only on the rural sample, finds results similar to, although smaller than, those from the probit estimates. Policy recommendations are discussed.  相似文献   

11.
This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item‐level multidimensionality, and (2) whether a Projection IRT model can provide a useful remedy. A real‐data example is used to illustrate the problem and also is used as a base model for a simulation study. The results suggest that ignoring item‐level multidimensionality might lead to inflated item discrimination parameter estimates when the proportion of multidimensional test items to unidimensional test items is as low as 1:5. The Projection IRT model appears to be a useful tool for updating unidimensional item parameter estimates of multidimensional test items for a purified unidimensional interpretation.  相似文献   

12.
Ridge generalized least squares (RGLS) is a recently proposed estimation procedure for structural equation modeling. In the formulation of RGLS, there is a key element, ridge tuning parameter, whose value determines the efficiency of parameter estimates. This article aims to optimize RGLS by developing formulas for the ridge tuning parameter to yield the most efficient parameter estimates in practice. For the formulas to have a wide scope of applicability, they are calibrated using empirical efficiency and via many conditions on population distribution, sample size, number of variables, and model structure. Results show that RGLS with the tuning parameter determined by the formulas can substantially improve the efficiency of parameter estimates over commonly used procedures with real data being typically nonnormally distributed.  相似文献   

13.
The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet‐based assessment, both local item dependence and local person dependence are likely to be induced. This study proposed a four‐level IRT model to simultaneously account for dual local dependence due to item clustering and person clustering. Model parameter estimation was explored using the Markov Chain Monte Carlo method. Model parameter recovery was evaluated in a simulation study in comparison with three other related models: the Rasch model, the Rasch testlet model, and the three‐level Rasch model for person clustering. In general, the proposed model recovered the item difficulty and person ability parameters with the least total error. The bias in both item and person parameter estimation was not affected but the standard error (SE) was affected. In some simulation conditions, the difference in classification accuracy between models could go up to 11%. The illustration using the real data generally supported model performance observed in the simulation study.  相似文献   

14.
As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.  相似文献   

15.
Trend estimation in international comparative large‐scale assessments relies on measurement invariance between countries. However, cross‐national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare trend estimation performances to two linking methods employing international item parameters across several conditions. The trend estimates based on the national item parameters were more accurate than the trend estimates based on the international item parameters when cross‐national DIF was present. Moreover, the use of fixed common item parameter calibrations led to biased trend estimates. The detection and elimination of DIF can reduce this bias but is also likely to increase the total error.  相似文献   

16.
Respondent attrition is a common problem in national longitudinal panel surveys. To make full use of the data, weights are provided to account for attrition. Weight adjustments are based on sampling design information and data from the base year; information from subsequent waves is typically not utilized. Alternative methods to address bias from nonresponse are full information maximum likelihood (FIML) or multiple imputation (MI). The effects on bias of growth parameter estimates from using these methods are compared via a simulation study. The results indicate that caution needs to be taken when utilizing panel weights when there is missing data, and to consider methods like FIML and MI, which are not as susceptible to the omission of important auxiliary variables.  相似文献   

17.
Local equating (LE) is based on Lord's criterion of equity. It defines a family of true transformations that aim at the ideal of equitable equating. van der Linden (this issue) offers a detailed discussion of common issues in observed‐score equating relative to this local approach. By assuming an underlying item response theory model, one of the main features of LE is that it adjusts the equated raw scores using conditional distributions of raw scores given an estimate of the ability of interest. In this article, we argue that this feature disappears when using a Rasch model for the estimation of the true transformation, while the one‐parameter logistic model and the two‐parameter logistic model do provide a local adjustment of the equated score.  相似文献   

18.
To better understand the statistical properties of the deterministic inputs, noisy “and” gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the prior distribution matches the latent class structure. However, when the latent classes are of indefinite structure, the empirical Bayes method in conjunction with an unstructured prior distribution provides much better estimates and classification accuracy. Moreover, using empirical Bayes with an unstructured prior does not lead to extremely poor results as other prior-estimation method combinations do. The simulation results also show that increasing the sample size reduces the variability, and to some extent the bias, of item parameter estimates, whereas lower level of guessing and slip parameter is associated with higher quality item parameter estimation and classification accuracy.  相似文献   

19.
Six procedures for combining sets of IRT item parameter estimates obtained from different samples were evaluated using real and simulated response data. In the simulated data analyses, true item and person parameters were used to generate response data for three different-sized samples. Each sample was calibrated separately to obtain three sets of item parameter estimates for each item. The six procedures for combining multiple estimates were each applied, and the results were evaluated by comparing the true and estimated item characteristic curves. For the real data, the two best methods from the simulation data analyses were applied to three different-sized samples and the resulting estimated item characteristic curves were compared to the curves obtained when the three samples were combined and calibrated simultaneously. The results support the use of covariance matrix-weighted averaging and a procedure that involves sample-size-weighted averaging of estimated item characteristic curves at the center of the ability distribution  相似文献   

20.
Because parameter estimates from different calibration runs under the IRT model are linearly related, a linear equation can convert IRT parameter estimates onto another scale metric without changing the probability of a correct response (Kolen & Brennan, 1995, 2004). This study was designed to explore a new approach to finding a linear equation by fixing C-parameters for anchor items in IRT equating. A rationale for fixing C-parameters for anchor items in IRT equating can be established from the fact that the C-parameters are not affected by any linear transformation. This new approach can avoid the difficulty in getting accurate C-parameters for anchor items embedded in the application of the IRT model. Based upon our findings in this study, we would recommend using the new approach to fix C-parameters for anchor items in IRT equating. This work was supported by a Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号