期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘志明《考试研究》2005,(2)

等值(equating)和纵向量表化(vertical scaling)的功用是建立来自不同考试的分数之间的关系。等值是施用于相同年级,相同性质的试卷,而纵向量表化则用于不同年级而性质相似的试卷。纵向量表化是将不同年级的成绩放置于统一的成长分数量表之中。纵向量表(vertical scale)是一种延伸的分数,其度量跨越和串连不同年级之间,用以评估学生连继性的成就成长(Nitko,2004)。在教学中,学生的进度可以利用纵向量表来监察和评估。而在教育研究上,纵向量表可成为长期跟踪调查(longitudinal study)之有力工具。本文讨论纵向量表化的方法论,包括成长定义(definition of growth),数据收集(data collection)方法,试卷设计和使用项目反应理论(Item Response Theory)的方法以及对制作纵向量表提供一些实际的建议。相似文献

2.

Stefanie A. Wind 《Educational Measurement》2017,36(2):50-66

Mokken scale analysis (MSA) is a probabilistic‐nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic‐nonparametric framework in which to explore measurement quality, with an emphasis on its application in the context of educational assessment. The module describes both dichotomous and polytomous formulations of the MSA model. Examples of the application of MSA to educational assessment are provided using data from a multiple‐choice physical science assessment and a rater‐mediated writing assessment. 相似文献

3.

Gregory Camilli Sunhee Kim 《Educational Measurement》2018,37(3):4-10

The trend in mathematics achievement from preschool to kindergarten is studied with a longitudinal growth item response theory model. The three measurement occasions included the spring of preschool and the spring and fall of kindergarten. The growth trend was nonlinear, with a steep drop between spring of preschool and fall of kindergarten. The modeling results provide validation for the argument that a classroom assessment in mathematics can be used to assess developmental skill levels that are consistent with a theory of early mathematics acquisition. The statistical model employed enables an effective illustration of overall gains and individual variability. Implications of the summer loss are discussed as well as model limitations. 相似文献

4.

Sarah Lindstrom Johnson Ray E. Reichenberg Kathan Shukla Tracy E. Waasdorp Catherine P. Bradshaw 《Educational Measurement》2019,38(4):99-107

The U.S. government has become increasingly focused on school climate, as recently evidenced by its inclusion as an accountability indicator in the Every Student Succeeds Act. Yet, there remains considerable variability in both conceptualizing and measuring school climate. To better inform the research and practice related to school climate and its measurement, we leveraged item response theory (IRT), a commonly used psychometric approach for the design of achievement assessments, to create a parsimonious measure of school climate that operates across varying individual characteristics. Students (n = 69,513) in 111 secondary schools completed a school climate assessment focused on three domains of climate (i.e., safety, engagement, and environment), as defined by the U.S. Department of Education. Item and test characteristics were estimated using the mirt package in R using unidimensional IRT. Analyses revealed measurement difficulties that resulted in a greater ability to assess less favorable perspectives on school climate. Differential item functioning analyses indicated measurement differences based on student academic success. These findings support the development of a broad measure of school climate but also highlight the importance of work to ensure precision in measuring school climate, particularly when considering use as an accountability measure. 相似文献

5.

CTT与IRT测量原理之比较

沐守宽《上海师范大学学报(哲学社会科学版)》2006,35(4):6-9

通过对经典测量理论与项目反应理论在基本假设、测验精度计量、测验的标准误以及测验项目的筛选等四个主要领域的比较,可以发现项目反应理论具有被试能力估计的项目选择独立性、项目难度参数与能力参数的刻度统一性、项目参数估计的样本独立性、估计测量误差的精确性等几个优点;但是在某些模型中存在单维性假设难以满足、测验条件要求严格以及数学模型简约性差等需要解决的问题。相似文献

6.

General Growth Mixture Analysis of Adolescents' Developmental Trajectories of Anxiety: The Impact of Untested Invariance Assumptions on Substantive Interpretations

Alexandre J. S. Morin Christophe Maïano Benjamin Nagengast Herbert W. Marsh Julien Morizot Michel Janosz 《Structural equation modeling》2013,20(4):613-648

Substantively, this study investigates potential heterogeneity in the developmental trajectories of anxiety in adolescence. Methodologically, this study demonstrates the usefulness of general growth mixture analysis (GGMA) in addressing these issues and illustrates the impact of untested invariance assumptions on substantive interpretations. This study relied on data from the Montreal Adolescent Depression Development Project (MADDP), a 4-year follow-up of more than 1,000 adolescents who completed the Beck Anxiety Inventory each year. GGMA models relying on different invariance assumptions were empirically compared. Each of these models converged on a 5-class solution, but yielded different substantive results. The model with class-varying variance–covariance matrices was retained as providing a better fit to the data. These results showed that although elevated levels of anxiety might fluctuate over time, they clearly do not represent a transient phenomenon. This model was then validated in relation to multiple predictors (mostly related to school violence) and outcomes (grade-point average, school dropout, depression, loneliness, and drug-related problems). 相似文献

7.

Damazo T. Kadengye Eva Ceulemans Wim Van Den Noortgate 《Journal of Experimental Education》2015,83(2):175-202

In educational environments, monitoring persons' progress over time may help teachers to evaluate the effectiveness of their teaching procedures. Electronic learning environments are increasingly being used as part of formal education and resulting datasets can be used to understand and to improve the environment. This study presents longitudinal models based on the item response theory (IRT) for measuring persons' ability within and between study sessions in data from web-based learning environments. Two empirical examples are used to illustrate the presented models. Results show that by incorporating time spent within- and between-study sessions into an IRT model; one is able to track changes in ability of a population of persons or for groups of persons at any time of the learning process. 相似文献

8.

基于项目反应理论的测验编制方法研究 总被引：3，自引：0，他引：3

戴海琦《考试研究》2006,(4)

本文在简单介绍项目反应理论的基础上,从计量分析的角度,深入探讨了应用项目反应理论编制各种测验的一般步骤;探讨了项目反应理论题库建设方法及基于题库的测验编制方法;探讨了标准参照测验合格分数线的划分方法。相似文献

9.

The Examination of the Classification of Students into Performance Categories by Two Different Equating Methods

Lisa A. Keller Robert R. Keller Pauline A. Parker 《Journal of Experimental Education》2013,81(1):30-52

This study investigates the comparability of two item response theory based equating methods: true score equating (TSE), and estimated true equating (ETE). Additionally, six scaling methods were implemented within each equating method: mean-sigma, mean-mean, two versions of fixed common item parameter, Stocking and Lord, and Haebara. Empirical test data were examined to investigate the consistency of scores resulting from the two equating methods, as well as the consistency of the scaling methods both within equating methods and across equating methods. Results indicate that although the degree of correlation among the equated scores was quite high, regardless of equating method/scaling method combination, non-trivial differences in equated scores existed in several cases. These differences would likely accumulate across examinees making group-level differences greater. Systematic differences in the classification of examinees into performance categories were observed across the various conditions: ETE tended to place lower ability examinees into higher performance categories than TSE, while the opposite was observed for high ability examinees. Because the study was based on one set of operational data, the generalizability of the findings is limited and further study is warranted. 相似文献

10.

Understanding the Impact of Accountability on Preservice Teachers’ Decisions About Where to Teach

Jennifer C. Ng 《The Urban Review》2006,38(5):353-372

Research has shown that individuals who become teachers are uniquely oriented to the psychic rewards of teaching such as connecting with students and making a difference. Yet, in the era of “No Child Left Behind”, emphasis upon test scores as indicators of student learning, competition within and between school districts, and threats of external sanctions seem to promote a different orientation to teachers’ work. This is especially the case in schools with limited human, social, physical, and cultural capital serving disproportionate numbers of low-income, racial/ethnic and linguistic minority students typically located in urban areas. Given the existing problem of teacher shortages in urban schools and the current impact of accountability, this study seeks to explore two questions: How do preservice teachers believe their aspirations to teach will be affected by the accountability movement? And how do these views affect their considerations about where to teach? 相似文献

11.

多维项目反应理论在数学素养测验中的应用

林子植胡典顺《中国考试》2021,(5):72-80

学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。相似文献

12.

Minimizing the Influence of Item Parameter Estimation Errors in Test Development: A Comparison of Three Selection Procedures

Mark J. Gierl Dianne Henderson Michael Jodoin Don Klinger 《Journal of Experimental Education》2013,81(3):261-279

In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed. 相似文献

13.

湍流中的标度律(英文)

闵琦《蒙自师范高等专科学校学报》2003,1(6):63-65

本文论述了湍流中标度律的存在性,并结合K41、β模型和分维的介绍,全面讨论了存在于充分发展湍流中的标度律及其对湍流研究的意义。相似文献

14.

计算机信息技术课无纸化考试的研究 总被引：1，自引：0，他引：1

朱小明李向荣林捷赵锦红《中国教育技术装备》2007,(1):11-14

介绍考试理论从经典测量到项目反应的发展,指出计算机化考试的必然性和优越性。对计算机考试如何在多媒体网络实验室实现,进行了较详细的阐述。相似文献

15.

对基于项目反映理论的计算机自适应测试方法的再思考

刘培艳王淑琴《唐山师范学院学报》2013,(2):44-46

以项目反应理论IRT（ItemResponseTheory）为基础,介绍项目反应理论IRT的特点,以及基于项目反应理论IRT的计算机自适应测试的工作原理,并在此基础上总结了起点选择的方法,提出了测试流程两步制的改进方案,通过对测试流程的改进,大大减少了与被试能力值相差较远的测试项目,缩短了测试时间和计算量,同时能准确地估计被试能力值。相似文献

16.

Reliably Assessing Growth with Longitudinal Diagnostic Classification Models

Matthew J. Madison 《Educational Measurement》2019,38(2):68-78

Recent advances have enabled diagnostic classification models (DCMs) to accommodate longitudinal data. These longitudinal DCMs were developed to study how examinees change, or transition, between different attribute mastery statuses over time. This study examines using longitudinal DCMs as an approach to assessing growth and serves three purposes: (1) to define and evaluate two reliability measures to be used in the application of longitudinal DCMs; (2) through simulation, demonstrate that longitudinal DCM growth estimates have increased reliability compared to longitudinal item response theory models; and (3) through an empirical analysis, illustrate the practical and interpretive benefits of longitudinal DCMs. A discussion describes how longitudinal DCMs can be used as practical and reliable psychometric models when categorical and criterion‐referenced interpretations of growth are desired. 相似文献

17.

基于题目反应理论的网络自适应考试

苏婕《天津职业院校联合学报》2007,9(5):106-109

随着计算机的普及、网络的发展、教学和考试测评理论的更新,一种基于题目反应理论的计算机自适应考试已经越来越普及,它以其题目适应不同能力学生水平自动变化的特点,已经被越来越多的考试所采用,针对题目反应理论,需要对自适应考试实现等问题加以论述。相似文献

18.

结构方程模型和IRT等级反应模型在人格量表项目筛选中的对比研究

邹丹杰伍霞《内江师范学院学报》2014,(12)

为比较结构方程模型和 IRT等级反应模型在人格量表项目筛选上的作用,以《中国大学生人格量表》的7229个实际测量数据为基础,针对因素二“爽直”分别以Lisrel8.70和Multilog7.03进行结构方程模型和等级反应模型的参数估计与拟合,比较两种方法的项目筛选结果.二者统计结果均认为项目5、6、7、8拟合度不佳,在结构方程模型上表现为因子负荷较低,整体拟合指数不理想;在等级反应模型上表现为区分度参数和位置参数不理想,相关项目的特征曲线和信息曲线形态较差.但结构方程模型倾向于项目6、8更差,而等级反应模型则倾向于项目5、6更差.结构方程模型和 IRT等级反应模型对人格量表项目的统计推断结果从总体上讲是一致的,但在个别项目上略有差异.二者各有优势,可以结合使用. 相似文献

19.

The Effects of State Decisions About NCLB Adequate Yearly Progress Targets 总被引：2，自引：0，他引：2

Andrew C. Porter Robert L. Linn C. Scott Trimble 《Educational Measurement》2005,24(4):32-39

The No Child Left Behind Act allows states to vary (a) the trajectories they select to move from the baseline percent proficient or above in 2002 to the 100% proficient goal in 2014, (b) the minimum number of students required for reporting of disaggregated subgroup results, and (c) whether or not they will use confidence intervals when determining whether or not an annual measurable objective has been met. We use data from Kentucky for the years 2003 and 2004 to explore the consequences of different design decisions. The effect of design decisions on number and percentage of schools meeting adequate yearly progress (AYP) is large, with important implications for education practice. 相似文献

20.

In-Hee Choi Insu Paek Sun-Joo Cho 《Journal of Experimental Education》2017,85(3):411-424

The purpose of the current study is to examine the performance of four information criteria (Akaike's information criterion [AIC], corrected AIC [AICC] Bayesian information criterion [BIC], sample-size adjusted BIC [SABIC]) for detecting the correct number of latent classes in the mixture Rasch model through simulations. The simulation study manipulated various class-distinction features (percentages of class-variant items, magnitudes, and patterns of item difficulty differences) and mixing proportions, assuming that a mixture Rasch model with two latent classes was the true model. Unlike previous studies that showed BIC's superiority to other indices, our findings from this study suggested that the four information criteria had differential performance depending on the percentage of class-variant items and the magnitude and pattern of item difficulty differences under a two-class structure. Furthermore, the present study revealed that AICC and SABIC generally performed as good as or better than their counterparts, AIC and BIC, respectively, for the class-class structure with a sample of 3,000. 相似文献