首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mantel-Haenszel方法(简称M-H方法)是探测试题是否存在DIF现象的一类重要和普遍的方法。样本容量的选择是应用M-H方法的一个关键环节。本文以某年度某市高考抽样数据英语学科选择题的作答数据为总体,探讨了不同样本容量对该方法检验敏感性的影响程度。研究结果表明:对于本研究给定的总体,在一定的样本容量范围内,检验结果均具有较好的一致性。  相似文献   

2.
Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no published study manipulated proficiency variance ratio (VR). Data were generated with the three-parameter logistic (3PL) IRT model. Proficiency VRs were 1, 2, 3, and 4. The present study suggests inflation may be greater, and may affect all highly discriminating items (low, moderate, and high difficulty), when IRT proficiency distributions of reference and focal groups differ also in variances. Inflation was greatest on the 21-item test (vs. 41) and 2,000 total sample size (vs. 1,000). Previous studies had not systematically examined sample size ratio. Sample size ratio of 1:1 produced greater TIE inflation than 3:1, but primarily for total sample size of 2,000.  相似文献   

3.
《教育实用测度》2013,26(1):11-22
Previous research has provided conflicting findings about whether allowing the use of calculators changes the difficulty of mathematics tests or the time needed to complete the tests. Because the interpretation of results from standardized tests via norm tables depends on standardized conditions, the impact of allowing or not allowing examinees to use calculators while taking such tests would need to be specified as part of the standardizing condition. This article examines four item types that may perform differently under different conditions of calculator use. This article also examines the effect of testing under calculator and noncalculator conditions on testing time, reliability, item difficulty, and item discrimination.  相似文献   

4.
评分教师的评分效应和评分量表研究是研究主观题评分误差的核心问题。本论文以2006年高考政治(上海卷)第38题(论述题)为例,运用ACER Conquest的Raters Effect模型研究,结果显示该大题基本没有表现出模糊性、趋中性和等级限制等评分误差,评分教师能够比较好地区分考生不同行为特征,除个别评分教师的评分一致性还有待提高外,评分松紧度差异比较显著。为此,作者提出根据松紧度调整考试分数的方法。  相似文献   

5.
等值标准误是判断等值误差的一个非常有效的指标。本文用Delta方法分析了几种常见等值方法的标准误随样本容量变化的情况,证实了增加样本容量可以减小两个测验间等效分数的标准误,进而可以提高等值的精确度;同时,也表明了Delta方法计算每个等效分数点等值标准误的优势,可以方便地实现多种等值方法标准误的比较。  相似文献   

6.
7.
Abstract

Covariance structure analysis provides a useful methodology to test hypotheses about competing structural models. The chi-square goodness of fit test is basically an appropriate test for model evaluation. However, methodologists are particularly concerned about the validity of the test to detect misspecified models in small samples. At the same time, there is the concern of rejecting models with reasonably good fit in large samples. The present Monte Carlo study examined the validity of the chi-square test in different instances of misspecification and sample size. The usefulness of the chi-square difference statistic to compare competing structures and improvement in fit is also addressed.  相似文献   

8.
This study examined the reliability of the Mantel-Haenszel indexes across different samples of test takers as well as across sample sizes and investigated whether these indexes are robust to item context effects. Mathematics data from the Second International Mathematics Study (SIMS; 1985) for U.S. eighth-grade students were analyzed. The results suggest that the MH D-DIF is robust to item context effects. However, larger sample sizes than those used in this investigation (N = 141-167 for the focal group) may be necessary to obtain stable estimates from the Mantel-Haenszel procedure.  相似文献   

9.
The Effects of Score Group Width on the Mantel-Haenszel Procedure   总被引:1,自引:0,他引:1  
Previous research examining the effects of reducing the number of score groups used in the matching criterion of the Mantel-Haenszel procedure, when screening for DIF, has produced ambiguous results. The goal of this study was to resolve the ambiguity by examining the problem with a simulated data set. The main results from this study call into question the preliminary recommendations of several other researchers that four or more score groups are sufficient and produce stable results. Although considerable stability and very little Type I error was noted with equal ability distribution comparisons, with unequal ability distributions, the Type I error rate was substantially inflated. These results argue against the appropriateness of implementing the procedure by collapsing score groups. The current data suggest that more than modest reductions in the number of score groups cannot be recommended when the ability distributions of the reference and focal groups differ  相似文献   

10.
Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the problems associated with a small number of clusters, (2) review previous studies on multilevel models with a small number of clusters, (3) to provide an illustrative simulation to demonstrate how a simple model becomes adversely affected by small numbers of clusters, (4) to provide researchers with remedies if they encounter clustered data with a small number of clusters, and (5) to outline methodological topics that have yet to be addressed in the literature.  相似文献   

11.
《教育心理学家》2013,48(1):99-110
Based on reviews by Glass, Cahen, Smith, and Filby (1982) and the Educational Research Service (1978), Cooper (this issue) concludes that substantial reductions in class size can have important effects on low-achieving students in the early grades. This article critiques these reviews and summarizes the findings of experimental studies that compared the achievement levels of elementary school students in larger classes to classes with no more than 20 students. Even in studies that made such substantial reductions, achievement differences were slight, averaging only 13% of a standard deviation. Not until class size approaches one is there evidence of meaningful effects. Based on this and other evidence, it is suggested that Chapter 1 programs provide one-to-one tutoring in reading rather than providing small-group pullouts or reducing overall class size.  相似文献   

12.
本文首先提出团队异质性、规模、阶段与类型对学科团队创新绩效影响的理论假设,而后通过对中国高校86个学科团队660名教师进行问卷调查来验证,以为创新团队建设提供决策支持.研究得出,团队异质性、发展阶段、学科团队类型与团队创新绩效之间存在相关关系;团队规模与团队创新绩效没有显著相关性.  相似文献   

13.
Studies that have investigated differences in examinee performance on items administered in paper-and-pencil form or on a computer screen have produced equivocal results. Certain item administration procedures were hypothesized to be among the most important variables causing differences in item performance and ultimately in test scores obtained from these different administration media. A study where these item administration procedures were made as identical as possible for each presentation medium is described. In addition, a methodology is presented for studying the difficulty and discrimination of items under each presentation medium as a post hoc procedure.  相似文献   

14.
《教育实用测度》2013,26(4):329-349
The logistic regression (LR) procedure for differential item functioning (DIF) detection is a model-based approach designed to identify both uniform and nonuniform DIF. However, this procedure tends to produce inflated Type I errors. This outcome is problematic because it can result in the inefficient use of testing resources, and it may interfere with the study of the underlying causes of DIF. Recently, an effect size measure was developed for the LR DIF procedure and a classification method was proposed. However, the effect size measure and classification method have not been systematically investigated. In this study, we developed a new classification method based on those established for the Simultaneous Item Bias Test. A simulation study also was conducted to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure across sample sizes, ability distributions, and percentage of DIF items included on a test. The results indicate that the inclusion of the effect size measure can substantially reduce Type I error rates when large sample sizes are used, although there is also a reduction in power.  相似文献   

15.
Abstract

Factor mixture models are designed for the analysis of multivariate data obtained from a population consisting of distinct latent classes. A common factor model is assumed to hold within each of the latent classes. Factor mixture modeling involves obtaining estimates of the model parameters, and may also be used to assign subjects to their most likely latent class. This simulation study investigates aspects of model performance such as parameter coverage and correct class membership assignment and focuses on covariate effects, model size, and class-specific versus class-invariant parameters. When fitting true models, parameter coverage is good for most parameters even for the smallest class separation investigated in this study (0.5 SD between 2 classes). The same holds for convergence rates. Correct class assignment is unsatisfactory for the small class separation without covariates, but improves dramatically with increasing separation, covariate effects, or both. Model performance is not influenced by the differences in model size investigated here. Class-specific parameters may improve some aspects of model performance but negatively affect other aspects.  相似文献   

16.
在分析了李道本、陈少霞提出的基于最小差错概率盲均衡算法的基础上,用牛顿梯度变步长实现了基于最小差错概率新的盲均衡算法,仿真结果表明,与固定步长的盲均衡算法相比,该算法收敛速度快,均方误差小,在均衡技术的应用方面有一定的实用价值.  相似文献   

17.
This article describes a simple computer program which graphically demonstrates both Type I and Type II statistical errors.  相似文献   

18.
主观题评分标准研究   总被引:1,自引:0,他引:1  
本文以2006年上海市高考政治学科论述题评分标准为例,从三个方面研究如何评价主观题评分标准的优劣,即每个评分项是否具有相对独立性;根据若干评分项的结果是否能够推测出考生的综合论述的能力;每个评分项等第划分是否合理。因子分析表明该主观题四个评分项具有单维性,一个因子可以解释为考生的综合论述能力。相关分析表明四个评分项均具有相对独立性,对推测考生的综合论述能力起到了彼此独立的作用。Rasch评分量表模型分析显示,各评分项等级划分基本合理,但个别等级出现信息量不足,在此基础上,提出了改进评分标准的若干建议。  相似文献   

19.
Performance on items in intelligence and achievement tests can be represented in terms of child development and information processes. Research is reviewed on item performance that supports developmental and information processing effects, particularly in children. Some suggestions for item development in intelligence and achievement tests are presented.  相似文献   

20.
《教育实用测度》2013,26(1):31-57
Examined in this study were the effects of test length and sample size on the alternate forms reliability and the equating of simulated mathematics tests composed of constructed-response items scaled using the 2-parameter partial credit model. Test length was defined in terms of the number of both items and score points per item. Tests with 2, 4, 8, 12, and 20 items were generated, and these items had 2, 4, and 6 score points. Sample sizes of 200, 500, and 1,000 were considered. Precise item parameter estimates were not found when 200 cases were used to scale the items. To obtain acceptable reliabilities and accurate equated scores, the findings suggested that tests should have at least eight 6-point items or at least 12 items with 4 or more score points per item.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号