首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
项目反应理论(Item Response Theory,IRT)是现代教育心理测量领域中最有影响的一种测量理论,它的一个明确目标是扩展模型的种类以至于能够处理实际测试中任何形式的反应数据。在已有的各种模型研究中,对于多级评分项目,只考虑到项目区分度和难度。但在实际测验中,此类项目还可能存在猜测度。本研究基于Samejima等级反应模型,将项目猜测度融合到多级评分模型中,提出了三参数等级反应模型(Three-parameter Graded Response Model,3PL-GRM)。由于忽略多级反应项目的猜测度会使得该项目的信息量虚假升高,本研究还进一步将3PL—GRM的信息函数应用到试卷质量分析中。  相似文献   

2.
The posterior predictive model checking method is a flexible Bayesian model‐checking tool and has recently been used to assess fit of dichotomous IRT models. This paper extended previous research to polytomous IRT models. A simulation study was conducted to explore the performance of posterior predictive model checking in evaluating different aspects of fit for unidimensional graded response models. A variety of discrepancy measures (test‐level, item‐level, and pair‐wise measures) that reflected different threats to applications of graded IRT models to performance assessments were considered. Results showed that posterior predictive model checking exhibited adequate power in detecting different aspects of misfit for graded IRT models when appropriate discrepancy measures were used. Pair‐wise measures were found more powerful in detecting violations of the unidimensionality and local independence assumptions.  相似文献   

3.
崔维真 《考试研究》2012,(6):88-93,50
本研究根据前人的研究成果,选用单维等级反应模型(GRM),对高等汉语水平考试(简称HSK[高等])口试进行了实验分析。实验假设,等级反应模型下的评分能够更加精细地区分被试的能力。最终实验结果证实了该假设。  相似文献   

4.
Person reliability parameters (PRPs) model temporary changes in individuals’ attribute level perceptions when responding to self‐report items (higher levels of PRPs represent less fluctuation). PRPs could be useful in measuring careless responding and traitedness. However, it is unclear how well current procedures for estimating PRPs can recover parameter estimates. This study assesses these procedures in terms of mean error (ME), average absolute difference (AAD), and reliability using simulated data with known values. Several prior distributions for PRPs were compared across a number of conditions. Overall, our results revealed little differences between using the χ or lognormal distributions as priors for estimated PRPs. Both distributions produced estimates with reasonable levels of ME; however, the AAD of the estimates was high. AAD did improve slightly as the number of items increased, suggesting that increasing the number of items would ameliorate this problem. Similarly, a larger number of items were necessary to produce reasonable levels of reliability. Based on our results, several conclusions are drawn and implications for future research are discussed.  相似文献   

5.
The high school grade point average (GPA) is often adjusted to account for nominal indicators of course rigor, such as “honors” or “advanced placement.” Adjusted GPAs—also known as weighted GPAs—are frequently used for computing students’ rank in class and in the college admission process. Despite the high stakes attached to GPA, weighting policies vary considerably across states and high schools. Previous methods of estimating weighting parameters have used regression models with college course performance as the dependent variable. We discuss and demonstrate the suitability of the graded response model for estimating GPA weighting parameters and evaluating traditional weighting schemes. In our sample, which was limited to self‐reported performance in high school mathematics courses, we found that commonly used policies award more than twice the bonus points necessary to create parity for standard and advanced courses.  相似文献   

6.
We examined summary indices of high school performance (coursework, grades, and test scores) based on the graded response model (GRM). The indices varied by inclusion of ACT test scores and whether high school courses were constrained to have the same difficulty and discrimination across groups of schools. The indices were examined with respect to skewness, incremental prediction of college degree attainment, and differences across racial/ethnic and socioeconomic subgroups. The most difficult high school courses to earn an “A” grade included calculus, chemistry, trigonometry, other advanced math, physics, algebra 2, and geometry. The GRM‐based indices were less skewed than simple high school grade point average (HSGPA) and had higher correlations with ACT Composite score. The index that included ACT test scores and allowed item parameters to vary by school group was most predictive of college degree attainment, but had larger subgroup differences. Implications for implementing multiple measure models for college readiness are discussed.  相似文献   

7.
Sχ2 is a popular item fit index that is available in commercial software packages such as flexMIRT. However, no research has systematically examined the performance of Sχ2 for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of Sχ2 under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using Sχ2 within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. Sχ2 performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of Sχ2 were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of Sχ2 was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them.  相似文献   

8.
为比较结构方程模型和 IRT等级反应模型在人格量表项目筛选上的作用,以《中国大学生人格量表》的7229个实际测量数据为基础,针对因素二“爽直”分别以Lisrel8.70和Multilog7.03进行结构方程模型和等级反应模型的参数估计与拟合,比较两种方法的项目筛选结果.二者统计结果均认为项目5、6、7、8拟合度不佳,在结构方程模型上表现为因子负荷较低,整体拟合指数不理想;在等级反应模型上表现为区分度参数和位置参数不理想,相关项目的特征曲线和信息曲线形态较差.但结构方程模型倾向于项目6、8更差,而等级反应模型则倾向于项目5、6更差.结构方程模型和 IRT等级反应模型对人格量表项目的统计推断结果从总体上讲是一致的,但在个别项目上略有差异.二者各有优势,可以结合使用.  相似文献   

9.
As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.  相似文献   

10.
Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis.  相似文献   

11.
12.
在实施大学公共英语分级教学中,很重要的一个环节就是测试与评价的设计,包括学生分级测试的设计和分级教学后的测试与评价。该文从分级教学的理论和要求出发,试着探讨在教学过程中应如何设计不同层次的评价方式和测试方式,体现分级教学的个体认知能力的差异和学习需求差异,以便更好地完成分级教学的目标。  相似文献   

13.
In educational environments, monitoring persons' progress over time may help teachers to evaluate the effectiveness of their teaching procedures. Electronic learning environments are increasingly being used as part of formal education and resulting datasets can be used to understand and to improve the environment. This study presents longitudinal models based on the item response theory (IRT) for measuring persons' ability within and between study sessions in data from web-based learning environments. Two empirical examples are used to illustrate the presented models. Results show that by incorporating time spent within- and between-study sessions into an IRT model; one is able to track changes in ability of a population of persons or for groups of persons at any time of the learning process.  相似文献   

14.
采用分段线性Lagrange插值法,给出了由实验观测数据反求佛哈斯特方程中的参数的计算方法,算例证实了算法的有效性.  相似文献   

15.
井下作业时,巷道通风是一个重要的必须解决的安全问题,解决这一问题关键就需要确定通风参数.当残差GM(1,1)模型的精度不符合要求时,可利用残差序列建立GM(1,1)模型对原来模型进行修正,以提高精度.本文运用了一种新的残差序列建立GM(1,1)模型方法对原来模型进行修正,使平均模拟相对误差(^-△)和k点的模拟相对误差(△(k))得到了进一步的降低,也就是说更有效的提高了精度.  相似文献   

16.
我国的高等教育已经历史性地进入了国际公认的大众化教育阶段,在这样的教育大环境中,我国高校大力发展分层教育将是一种有益的探索,它可以促进人才培养与社会需求的良性互动。  相似文献   

17.
大学英语教学应确保不同层次的学生在英语应用能力方面得到充分的训练和提高.大学英语分级教学正体现了对学生因材施教的原则.对于英语水平相对较高的A级班学生,教师采用依托于教材,四学期有效衔接,循序渐进,并且各有侧重的教学模式,最终促进学生英语综合应用能力的发展.  相似文献   

18.
大学英语教学应确保不同层次的学生在英语应用能力方面得到充分的训练和提高。大学英语分级教学正体现了对学生因材施教的原则。对于英语水平相对较高的A级班学生,教师采用依托于教材,四学期有效衔接,循序渐进,并且各有侧重的教学模式,最终促进学生英语综合应用能力的发展。  相似文献   

19.
运用定量和定性相结合的研究方法,以某地方高校所进行的大学英语教学改革为研究对象,探索基于语料库的大学英语分级教学模式与传统的教学模式对学生自主学习和英语综合应用能力的影响。我们将课题组成员所教班级的学生分为实验组和控制组,在两种不同的教学模式下采用对比教学的方法,进行了为期两个学期的教学实验。试验后我们采用社会科学统计软件SPSS 17.0对实验组与控制组的实验数据及问卷调查进行了分析。结果表明:实验组的英语水平、英语成绩与自主学习能力明显优于控制组。基于语料库的大学英语分级教学模式不但可以有效地调动教师积极性,更可以发挥学生的主体性,激发他们的学习兴趣,促进其自主学习能力的发展,达到提高英语综合应用能力的目标。  相似文献   

20.
梁洪英  王瑛  胡晔 《海外英语》2011,(11):10-11,15
该文对实践分级教学模式的理论基础进行探讨,就实践过程中所发现的问题进行概括和分析,并对所采取的完善措施进行阶段性总结,以期对深化大学英语教学改革,培养高质量的英语人才提供一些参考。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号