期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using Rasch Analysis to Inform Rating Scale Development

Carol Van Zile-Tamsen 《Research in higher education》2017,58(8):922-933

The use of surveys, questionnaires, and rating scales to measure important outcomes in higher education is pervasive, but reliability and validity information is often based on problematic Classical Test Theory approaches. Rasch Analysis, based on Item Response Theory, provides a better alternative for examining the psychometric quality of rating scales and informing scale improvements. This paper outlines a six-step process for using Rasch Analysis to review the psychometric properties of a rating scale. The Partial Credit Model and Andrich Rating Scale Model will be described in terms of the pyschometric information (i.e., reliability, validity, and item difficulty) and diagnostic indices generated. Further, this approach will be illustrated through the example of authentic data from a university-wide student evaluation of teaching. 相似文献

2.

Rating scale instruments and measurement

Robert F. Cavanagh Joseph T. Romanoski 《Learning Environments Research》2006,9(3):273-289

The article examines theoretical issues associated with measurement in the human sciences and ensuring data from rating scale instruments are measures. An argument is made that using raw scores from rating scale instruments for subsequent arithmetic operations and applying linear statistics is less preferable than using measures. These theoretical matters are then illustrated by a report on the application of the Rasch Rating Scale Model in an investigation into elementary school classroom learning culture. 相似文献

3.

Applying the many‐facet Rasch model to evaluate PowerPoint presentation performance in higher education

Ramazan Basturk 《Assessment & Evaluation in Higher Education》2008,33(4):431-444

This study investigated the usefulness of the many‐facet Rasch model (MFRM) in evaluating the quality of performance related to PowerPoint presentations in higher education. The Rasch Model utilizes item response theory stating that the probability of a correct response to a test item/task depends largely on a single parameter, the ability of the person. MFRM extends this one‐parameter model to other facets of task difficulty, for example, rater severity, rating scale format, task difficulty levels. This paper specifically investigated presentation ability in terms of items/task difficulty and rater severity/leniency. First‐year science education students prepared and used the PowerPoint presentation software program during the autumn semester of the 2005–2006 school year in the ‘Introduction to the Teaching Profession’ course. The students were divided into six sub‐groups and each sub‐group was given an instructional topic, based on the content and objectives of the course, to prepare a PowerPoint presentation. Seven judges, including the course instructor, evaluated each group’s PowerPoint presentation performance using ‘A+ PowerPoint Rubric’. The results of this study show that the MFRM technique is a powerful tool for handling polytomous data in performance and peer assessment in higher education. 相似文献

4.

Use of Knowledge, Skill, and Ability Statements in Developing Licensure and Certification Examinations

Ning Wang Deborah Schnipke Elizabeth A. Witt 《Educational Measurement》2005,24(1):15-22

The task inventory approach is commonly used in job analysis for establishing content validity evidence supporting the use and interpretation of licensure and certification examinations. Although the results of a task inventory survey provide job task-related information that can be used as a reliable and valid source for test development, it is often the knowledge, skills, and abilities (KSAs) required for performing the tasks, rather than the job tasks themselves, which are tested by licensure and certification exams. This article presents a framework that addresses the important role of KSAs in developing and validating licensure and certification examinations. This includes the use of KSAs in linking job task survey results to the test content outline, transferring job task weights to test specifications, and eventually applying the results to the development of the test items. The impact of using KSAs in the development of test specifications is illustrated from job analyses for two diverse professions. One method for transferring job task weights from the job analysis to test specifications through KSAs is also presented, along with examples. The two examples demonstrated in this article are taken from nursing certification and real estate licensure programs. However, the methodology for using KSAs to link job tasks and test content is also applicable in the development of teacher credentialing examinations. 相似文献

5.

Evaluating construct validity and internal consistency of early childhood individualized family service plans

《Studies in Educational Evaluation》2015

This study presents evidence regarding the construct validity and internal consistency of the IFSP Rating Scale (McWilliam & Jung, 2001), which was designed to rate individualized family service plans (IFSPs) on 12 indicators of family centered practice. Here, the Rasch measurement model is employed to investigate the scale's functioning and fit for both person and item diagnostics of 120 IFSPs that were previously analyzed with a classical test theory approach. Analyses demonstrated scores on the IFSP Rating Scale fit the model well, though additional items could improve the scale's reliability. Implications for applying the Rasch model to improve special education research and practice are discussed. 相似文献

6.

A Multilevel Testlet Model for Dual Local Dependence

Hong Jiao Akihito Kamata Shudong Wang Ying Jin 《Journal of Educational Measurement》2012,49(1):82-100

The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet‐based assessment, both local item dependence and local person dependence are likely to be induced. This study proposed a four‐level IRT model to simultaneously account for dual local dependence due to item clustering and person clustering. Model parameter estimation was explored using the Markov Chain Monte Carlo method. Model parameter recovery was evaluated in a simulation study in comparison with three other related models: the Rasch model, the Rasch testlet model, and the three‐level Rasch model for person clustering. In general, the proposed model recovered the item difficulty and person ability parameters with the least total error. The bias in both item and person parameter estimation was not affected but the standard error (SE) was affected. In some simulation conditions, the difference in classification accuracy between models could go up to 11%. The illustration using the real data generally supported model performance observed in the simulation study. 相似文献

7.

Finding two dimensions in MMPI‐2 depression

Chih‐Hung Chang 《Structural equation modeling》2013,20(1):41-49

This study examined the underlying structure of the Depression scale of the revised Minnesota Multiphasic Personality Inventory using dichotomous Rasch model and factor analysis. Rasch methodology was used to identify and restructure the Depression scale, and factor analysis was used to confirm the structure established by the Rasch model. The item calibration and factor analysis were carried out on the full sample of 2,600 normative subjects. The results revealed that the Depression scale did not consist of one homogeneous set of items, even though the scale was developed to measure one dimension of depression. Rasch analysis, as well as factor analysis, recognized two distinct content‐homogeneous subscales, here labeled mental depression and physical depression. The Rasch methodology provided a basis for a better understanding of the underlying structure and furnished a useful solution to the scale refinement. 相似文献

8.

在EXCEL中应用Rasch模型计算题目难度

王生军《安徽广播电视大学学报》2004,(3):120-123

应用Rasch模型计算出来的题目难度值与被试样本无关,是题目的一项最重要的量化指标.Rasch模型的题目难度的计算在EXCEL程序中能很方便地完成,本文介绍了详细的计算步骤,并讨论了怎样用题目难度值来估算考生的能力水平. 相似文献

9.

Rasch analysis for psychometric improvement of science attitude rating scales

Pey-Tee Oon Xitao Fan 《International Journal of Science Education》2013,35(6):683-700

ABSTRACT

Students’ attitude towards science (SAS) is often a subject of investigation in science education research. Survey of rating scale is commonly used in the study of SAS. The present study illustrates how Rasch analysis can be used to provide psychometric information of SAS rating scales. The analyses were conducted on a 20-item SAS scale used in an existing dataset of The Trends in International Mathematics and Science Study (TIMSS) (2011). Data of all the eight-grade participants from Hong Kong and Singapore (N?=?9942) were retrieved for analyses. Additional insights from Rasch analysis that are not commonly available from conventional test and item analyses were discussed, such as invariance measurement of SAS, unidimensionality of SAS construct, optimum utilization of SAS rating categories, and item difficulty hierarchy in the SAS scale. Recommendations on how TIMSS items on the measurement of SAS can be better designed were discussed. The study also highlights the importance of using Rasch estimates for statistical parametric tests (e.g. ANOVA, t-test) that are common in science education research for group comparisons. 相似文献

10.

Development and application of the Elementary School Science Classroom Environment Scale (ESSCES): measuring student perceptions of constructivism within the science classroom

Shelagh M. Peoples Laura M. O’Dwyer Yang Wang Jessica J. Brown Camelia V. Rosca 《Learning Environments Research》2014,17(1):49-73

This article describes the development, validation and application of a Rasch-based instrument, the Elementary School Science Classroom Environment Scale (ESSCES), for measuring students’ perceptions of constructivist practices within the elementary science classroom. The instrument, designed to complement the Reformed Teaching Observation Protocol (RTOP), is conceptualised using the RTOP’s three construct domains: Lesson Design and Implementation; Content; and Classroom Culture. Data from 895 elementary students was used to develop the Rasch scale, which was assessed for item fit, invariance and dimensionality. Overall, the data conformed to the assumptions of the Rasch model. In addition, the structural relationships among the retained items of the Rasch model supported and validated the instrument for measuring the reformed science classroom environment theoretical construct. The application of the ESSCES in a research study involving fourth grade students provides evidence that educators and researchers have a reliable instrument for understanding the elementary science classroom environment through the lens of the students. 相似文献

11.

Mokken Scale Analysis: Theoretical Considerations and an Application to Transitivity Tasks

《教育实用测度》2013,26(4):355-373

This study provides a discussion and an application of Mokken scale analysis. Mokken scale analysis can be characterized as a nonparametric item response theory approach. The Mokken approach to scaling consists of two different item response models, the model of monotone homogeneity and the more restrictive model of double monotonicity. Methods for empirical data analysis using the two Mokken model versions are discussed. Both dichotomous and polytomous item scores can be analyzed by means of Mokken scale analysis. Three empirical data sets pertaining to transitive inference items were analyzed using the Mokken approach. The results are compared with the results obtained from a Rasch analysis. 相似文献

12.

Gender Fairness in Self-Efficacy? A Rasch-Based Validity Study of the General Academic Self-Efficacy Scale (GASE)

T. Nielsen J. Dammeyer M. L. Vang G. Makransky 《Scandinavian Journal of Educational Research》2018,62(5):664-681

Studies have reported gender differences in academic self-efficacy. However, how and if academic self-efficacy questionnaires are gender-biased has not been psychometrically investigated. The psychometric properties of a general version of The Physics Self-Efficacy Questionnaire – the General Academic Self-Efficacy Scale (GASE) – were analyzed using Rasch measurement models, with data from 1018 Danish university students (psychology and technical), focusing on gender invariance and the sufficiency of the score. The short 4-item GASE scale was found to be essentially objective and construct valid and satisfactorily reliable, though differential item functioning was found relative to gender and academic discipline, and can be used to assess students’ general academic self-efficacy. Research on gender and self-efficacy needs to take gender into account and equate scores appropriately for unbiased analysis within academic disciplines. 相似文献

13.

Automatic item generation of probability word problems

Heinz Holling Jonas P. Bertling Nina Zeuch 《Studies in Educational Evaluation》2009,35(2-3):71-76

Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems based on templates that allow for the generation of word problems involving different topics from probability theory. It was tested in a pilot study with N = 146 German university students. The items show a good fit to the Rasch model. Item difficulties can be explained by the Linear Logistic Test Model (LLTM) and by the random-effects LLTM. The practical implications of these findings for future test development in the assessment of probability competencies are also discussed. 相似文献

14.

Stability of Rasch Scales Over Time

Catherine S. Taylor Yoonsun Lee 《教育实用测度》2013,26(1):87-113

Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items. When tests are equated across forms, researchers check for the stability of common items before including them in equating procedures. Stability is usually examined in relation to polytomous items' central “location” on the scale without taking into account the stability of the different item scores (step difficulties). We examined the stability of score scales over a 3–5-year period, considering both stability of location values and stability of step difficulties for common item equating. We also investigated possible changes in the scale measured by the tests and systematic scale drift that might not be evident in year-to-year equating. Results across grades and content areas suggest that equating results are comparable whether or not the stability of step difficulties is taken into account. Results also suggest that there may be systematic scale drift that is not visible using year-to-year common item equating. 相似文献

15.

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Jason L. Meyers G. Edward Miller Walter D. Way 《教育实用测度》2013,26(1):38-60

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change, test level, test content, and item format. As a follow-up to the real data analysis, a simulation study was performed to assess the effect of item position change on equating. Results from this study indicate that item position change significantly affects change in RID. In addition, although the test construction procedures used in the investigated state seem to somewhat mitigate the impact of item position change, equating results might be impacted in testing programs where other test construction practices or equating methods are utilized. 相似文献

16.

A Framework for Measuring the Amount of Adaptation of Rasch‐based Computerized Adaptive Tests

Adam E. Wyse James R. McBride 《Journal of Educational Measurement》2021,58(1):83-103

A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch‐based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the administered items and target item locations determined from provisional ability estimates at the start of each item. Several new indices based on this framework are introduced and compared to previously suggested measures of adaptation using simulated and real test data. Results from the simulation indicate that some previously suggested indices are not as sensitive to changes in item pool size and the use of constraints as the new indices and may not work as well under different item selection rules. The simulation study and real data example also illustrate the utility of using the new indices to measure adaptation at both a group and individual level. Discussion is provided on how one may use several of the indices to measure adaptation of Rasch‐based CATs in practice. 相似文献

17.

中国英语能力等级量表中句法量表的构建——基于句法区别性特征的方法

胥云武尊民柳丽萍《中国考试》2021,(4)

中国英语句法能力等级量表研制组在建立描述语库的过程中发现,已采集到的句法描述语存在数量不足、可解释性差及对特定水平学习者应掌握的句法特征界定不清等问题。研制组从以往二语习得研究中提取句法区别性特征,基于我国英语学习者语料库观察这些区别性特征在真实语言使用样本中的分布情况,并对比分析不同水平群体的句法错误变化规律,在撰写句法描述语时注重突出不同水平学习者应掌握的最典型、最重要的句法特征,反映不同阶段学习者的句法知识掌握及运用情况。基于大规模问卷调查对描述语进行分级验证后发现,使用这种方法构建的句法量表数据分析结果良好,量表建设时专家基于区别性特征对描述语的定级与问卷调查定级结果高度吻合。相似文献

18.

计算机自适应测验中Rasch模型稳健性的模拟研究 总被引：1，自引：0，他引：1

邓远平蔡艳罗照盛《考试研究》2006,(3)

本研究采用模拟数据的方法,在计算机自适应测验(Computer Adaptive Test,简称CAT)中分别采用Rasch及Birnbaum两种模型估计能力,通过比较两者的误差均方根(Root Mean Square Error,简称RMSE)、平均差异(Average Deviation,简称AD)及能力相关,对Rasch模型在CAT中的稳健性进行了研究。结果发现Rasch模型在区分度不等的条件下仍然能较准确地估计被试的能力水平,具有很强的稳健性。相似文献

19.

Instructional Topics in Educational Measurement (ITEMS) Module: Using Automated Processes to Generate Test Items

Mark J. Gierl Hollis Lai 《Educational Measurement》2013,32(3):36-50

Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content‐specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. The purpose of this module is to describe and illustrate a template‐based method for generating test items. We outline a three‐step approach where test development specialists first create an item model. An item model is like a mould or rendering that highlights the features in an assessment task that must be manipulated to produce new items. Next, the content used for item generation is identified and structured. Finally, features in the item model are systematically manipulated with computer‐based algorithms to generate new items. Using this template‐based approach, hundreds or even thousands of new items can be generated with a single item model. 相似文献

20.

The Stability of IRT b Values

Robert C. Sykes Anne R. Fitzpatrick 《Journal of Educational Measurement》1992,29(3):201-211

This study investigated possible explanations for an observed change in Rasch item parameters (b values) obtained from consecutive administrations of a professional licensure examination. Considered in this investigation were variables related to item position, item type, item content, and elapsed time between administrations of the item. An analysis of covariance methodology was used to assess the relations between these variables and change in item b values, with the elapsed time index serving to control for differences that could be attributed to average or pool changes in b values over time. A series of analysis of covariance models were fitted to the data in an attempt to identify item characteristics that were significantly related to the change in b values after the time elapsed between item administrations had been controlled. The findings indicated that the change in item b values was not related either to item position or to item type. A small, positive relationship between this change and elapsed time indicated that the pool b values were increasing over time. A test of simple effects suggested the presence of greater change for one of the content categories analyzed. These findings are interpreted, and suggestions for future research are provided. 相似文献