期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Measurement of Writing Ability With a Many-Faceted Rasch Model

《教育实用测度》2013,26(3):171-191

The purpose of this study is to describe a Many-Faceted Rasch (FACETS) model for the measurement of writing ability. The FACETS model is a multivariate extension of Rasch measurement models that can be used to provide a framework for calibrating both raters and writing tasks within the context of writing assessment. The use of the FACETS model for solving measurement problems encountered in the large-scale assessment of writing ability is presented here. A random sample of 1,000 students from a statewide assessment of writing ability is used to illustrate the FACETS model. The data suggest that there are significant differences in rater severity, even after extensive training. Small, but statistically significant, differences in writing- task difficulty were also found. The FACETS model offers a promising approach for addressing measurement problems encountered in the large- scale assessment of writing ability through written compositions. 相似文献

2.

以多面Rasch模型测量写作能力

George Engelhard Jr. 《教育与考试》2007,(4)

本研究的目的是描述一个用于测量写作能力的多面Rasch(FACETS)模型。该FACETS模型是Rasch测量模型的多元变量拓展,它可为写作测评中的校标评分员和写作题目提供框架。本文展示了如何应用FACETS模型解决大型写作测评中遇到的测量问题。参加全州写作考试的1000个随机抽取的学生样本被用来显示该FACETS模型。数据表明即使经过强化训练,评分员的严格度有显著区别。同时,本研究还发现,写作题目难度的区分,虽然微小,却具有统计意义上的显著性。该FACETS模型为解决以作文测评写作能力的大型考试遇到的测量问题提供了一个有前景的途径。相似文献

3.

THE EFFECTS OF THE DELETION OF MISFITTING PERSONS ON VERTICAL EQUATING VIA THE RASCH MODEL

S. E. PHILLIPS 《Journal of Educational Measurement》1986,23(2):107-118

The purpose of the study was to compare Rasch model equatings of multilevel achievement test data before and after the deletion of misfitting persons. The Rasch equatings were also compared with an equating obtained using the equipercentile method. No basis could be found in the results for choosing between the two Rasch equatings. The deletion of misfitting persons produced minor improvements in Rasch model fit to the data. Both Rasch equatings produced results that differed from the results of the equipercentile equating. The Rasch data also indicated that the misfitting persons deleted in the second Rasch equating tended to be from the lower portion of the achievement distribution, suggesting that they may have been guessing. 相似文献

4.

Measurement and modelling: sequential use of analytical techniques in a study of risk-taking in decision-making by school principals

Karen Trimmer 《Teacher Development》2016,20(3):398-416

This paper investigates reasoned risk-taking in decision-making by school principals using a methodology that combines sequential use of psychometric and traditional measurement techniques. Risk-taking is defined as when decisions are made that are not compliant with the regulatory framework, the primary governance mechanism for public schools in Western Australia. This creates a dilemma for principals who need to be able to respond to the locally identified needs within a school, and simultaneously comply with all State and Commonwealth departmental requirements. A theoretical model was developed and data collected through the survey of a stratified random sample of principals in 253 Western Australian government schools. Rasch measurement was used to create a measurement scale. The hypotheses were tested used partial least squares structural equation modelling. This analysis provides evidence of the effect of governance structures, characteristics of schools and principals that influence decision-making in schools. 相似文献

5.

Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model 总被引：2，自引：0，他引：2

George Engelhard Jr 《Journal of Educational Measurement》1994,31(2):93-112

This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used 相似文献

6.

Rasch测量原理及在高考命题评价中的实证研究 总被引：1，自引：1，他引：1

王蕾《中国考试》2008,(1):32-39

Rasch测量是当前教育与心理测量中具有客观等距量尺的测量。克服了经典测量的测验工具依赖和样本依赖的局限。本文通过介绍Rasch测量原理及其在高考命题评价考生抽样数据分析上的具体应用,为教育决策者和命题者提供了直观的Rasch测量对高考命题评价的量化图形表现形式。希望Rasch测量能在高考抽样数据分析中为命题量化评价提供新的、有价值的思考方式,能被教育决策者和命题者认同和有效使用。相似文献

7.

Measuring student graduateness: reliability and construct validity of the Graduate Skills and Attributes Scale

Melinde Coetzee 《高等教育研究与发展》2014,33(5):887-902

This study reports the development and validation of the Graduate Skills and Attributes Scale which was initially administered to a random sample of 272 third-year-level and postgraduate-level, distance-learning higher education students. The data were analysed using exploratory factor analysis. In a second study, the scale was administered to a stratified proportional random sample of 1102 early-career, undergraduate open distance-learning higher education students in the economic and management sciences field. The data were analysed using confirmatory factor and Rasch analyses. The structural validity and reliability of the scale were confirmed by the results. Educators and learning and development practitioners may be able to use the findings in their teaching, learning and assessment design. 相似文献

8.

Spurious Latent Classes in the Mixture Rasch Model

Natalia Alexeev Jonathan Templin Allan S. Cohen 《Journal of Educational Measurement》2011,48(3):313-332

Mixture Rasch models have been used to study a number of psychometric issues such as goodness of fit, response strategy differences, strategy shifts, and multidimensionality. Although these models offer the potential for improving understanding of the latent variables being measured, under some conditions overextraction of latent classes may occur, potentially leading to misinterpretation of results. In this study, a mixture Rasch model was applied to data from a statewide test that was initially calibrated to conform to a 3‐parameter logistic (3PL) model. Results suggested how latent classes could be explained and also suggested that these latent classes might be due to applying a mixture Rasch model to 3PL data. To support this latter conjecture, a simulation study was presented to demonstrate how data generated to fit a one‐class 2‐parameter logistic (2PL) model required more than one class when fit with a mixture Rasch model. 相似文献

9.

Development of an item bank for assessing generic competences in a higher-education institute: a Rasch modelling approach

Qin Xie Xiaoling Zhong Wen-Chung Wang Cher Ping Lim 《高等教育研究与发展》2014,33(4):821-835

This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social interaction skills and global perspective). The Rasch modelling approach was adopted for instrument development and validation. A total of 425 items were developed. The content validity of these items was examined via six focus group interviews with target students, and the construct validity was verified against data collected from a large student sample (N?=?1151). A matrix design was adopted to assemble the items in 26 test forms, which were distributed at random in each administration session. The results demonstrated that the item bank had high reliability and good construct validity. Cross-sectional comparisons of Years 1–4 students revealed patterns of changes over the years. Correlation analyses shed light on the relationships between the constructs. Implications are drawn to inform future efforts to develop the instrument, and suggestions are made regarding ways to use the instrument to enhance the teaching and learning of generic skills. 相似文献

10.

Digital Module 10: Rasch Measurement Theory https://ncme.elevate.commpartners.com

Jue Wang George Engelhard 《Educational Measurement》2019,38(4):112-113

In this digital ITEMS module, Dr. Jue Wang and Dr. George Engelhard Jr. describe the Rasch measurement framework for the construction and evaluation of new measures and scales. From a theoretical perspective, they discuss the historical and philosophical perspectives on measurement with a focus on Rasch's concept of specific objectivity and invariant measurement. Specifically, they introduce the origins of Rasch measurement theory, the development of model‐data fit indices, as well as commonly used Rasch measurement models. From an applied perspective, they discuss best practices in constructing, estimating, evaluating, and interpreting a Rasch scale using empirical examples. They provide an overview of a specialized Rasch software program (Winsteps) and an R program embedded within Shiny (Shiny_ERMA) for conducting the Rasch model analyses. The module is designed to be relevant for students, researchers, and data scientists in various disciplines such as psychology, sociology, education, business, health, and other social sciences. It contains audio‐narrated slides, sample data, syntax files, access to Shiny_ERMA program, diagnostic quiz questions, data‐based activities, curated resources, and a glossary. 相似文献

11.

Psychometric characteristics of the Persian version of the Multidimensional School Anger Inventory–Revised

Vahid Aryadoust Sanaz Akbarzadeh Sara Akbarzedeh 《Asia Pacific Journal of Education》2011,31(1):51-64

The Multidimensional School Anger Inventory–Revised (MSAI-R) is a measurement tool to evaluate high school students' anger. Its psychometric features have been tested in the USA, Australia, Japan, Guatemala, and Italy. This study investigates the factor structure and psychometric quality of the Persian version of the MSAI-R using data from an administration of the inventory to 585 Iranian high school students. The study adopted the four-factor underlying structure of high school student anger derived through factor analysis in previous validation studies, which consists of: School Hostility, Anger Experience, Positive Coping, and Destructive Expressions. Confirmatory factor analysis of this four-factor model indicated that it fit the data better than a one-factor baseline model, although the fit was not perfect. The Rasch model showed a very high internal consistency among items, with no item misfitting; however, our results suggest that to represent the construct sufficiently some items should be added to Positive Coping and Destructive Expression. This finding is in agreement with Boman, Curtis, Furlong, and Smith's Rasch analysis of the MSAI-R with an Australian sample. Overall, the results from this study support the psychometric features of the Persian MSAI-R. However, results from some test items also point to the dangers inherent in adapting the same test stimuli to widely divergent cultures. 相似文献

12.

Detecting Measurement Disturbances in Rater‐Mediated Assessments

下载免费PDF全文

Stefanie A. Wind Randall E. Schumacker 《Educational Measurement》2017,36(4):44-51

The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start‐up, plodding, boredom, or fatigue. An understanding of the different types of measurement disturbances can lead to a more complete understanding of persons or items in terms of the construct being measured. Although measurement disturbances have been explored in several contexts, they have not been explicitly considered in the context of performance assessments. The purpose of this study is to illustrate the use of graphical methods to explore measurement disturbances related to raters within the context of a writing assessment. Graphical displays that illustrate the alignment between expected and empirical rater response functions are considered as they relate to indicators of rating quality based on the Rasch model. Results suggest that graphical displays can be used to identify measurement disturbances for raters related to specific ranges of student achievement that suggest potential rater bias. Further, results highlight the added diagnostic value of graphical displays for detecting measurement disturbances that are not captured using Rasch model–data fit statistics. 相似文献

13.

A psychometric measurement model for adult English language learners: Pearson Test of English Academic

Hye K. Pae 《Educational Research and Evaluation》2013,19(3):211-229

The aim of this study was to apply Rasch modeling to an examination of the psychometric properties of the Pearson Test of English Academic (PTE Academic). Analyzed were 140 test-takers' scores derived from the PTE Academic database. The mean age of the participants was 26.45 (SD = 5.82), ranging from 17 to 46. Conformity of the participants' performance on the 86 items of PTE Academic Form 1 of the field test was evaluated using the partial credit model. The person reliability coefficient was .96, and item reliability was .99. The results showed that no significant differential item functioning was found across subgroups of gender and spoken-language context, indicating that the item data approximated the Rasch model. The findings of this study validated the test stability of PTE Academic as a useful measurement tool for English language learners' academic English assessment. 相似文献

14.

Rasch analysis for psychometric improvement of science attitude rating scales

Pey-Tee Oon Xitao Fan 《International Journal of Science Education》2013,35(6):683-700

ABSTRACT

Students’ attitude towards science (SAS) is often a subject of investigation in science education research. Survey of rating scale is commonly used in the study of SAS. The present study illustrates how Rasch analysis can be used to provide psychometric information of SAS rating scales. The analyses were conducted on a 20-item SAS scale used in an existing dataset of The Trends in International Mathematics and Science Study (TIMSS) (2011). Data of all the eight-grade participants from Hong Kong and Singapore (N?=?9942) were retrieved for analyses. Additional insights from Rasch analysis that are not commonly available from conventional test and item analyses were discussed, such as invariance measurement of SAS, unidimensionality of SAS construct, optimum utilization of SAS rating categories, and item difficulty hierarchy in the SAS scale. Recommendations on how TIMSS items on the measurement of SAS can be better designed were discussed. The study also highlights the importance of using Rasch estimates for statistical parametric tests (e.g. ANOVA, t-test) that are common in science education research for group comparisons. 相似文献

15.

Discovering patterns of achievement in Hong Kong students: An application of the Rasch measurement model

Shane N. Phillipson Andy Ka‐on Tse 《High Ability Studies》2007,18(2):173-190

Researchers have warned of the need to identify accurately students who are underachieving in Hong Kong, particularly among the gifted group. When comparing the relative effectiveness of three methods for estimating the proportion of underachievement, the absolute split method, using an arbitrary upper and lower limits for estimates of both performance and ability, is more useful for identifying gifted underachievers than the simple difference method (where standardized performance scores are subtracted from standardized ability scores) or the regression method. In contrast, the latter two methods are more useful for identifying underachievers at all levels of ability. All three methods, however, depend on measurements that are invariant, unidimensional and additive. With the advent of modern measurement theory using Rasch measurement models, it is now possible to satisfy these requirements. In this study, a sample of Primary 5 students in Hong Kong (n = 957) were asked to complete a test of mathematical achievement and the Ravens Progressive Matrices test in order to estimate the proportion of students who are underachieving at all levels of ability. Measurement scales were created using Rasch models for partial credit and dichotomous responses for each variable, respectively, and students placed on each scale according to their responses. Because the results are based on measurement scales that are invariant between persons, the identification of underachievement in these students across all levels of ability can be regarded as objective rather than sample dependent. 相似文献

16.

Comparative study of middle school students’ attitudes towards science: Rasch analysis of entire TIMSS 2011 attitudinal data for England,Singapore and the U.S.A. as well as psychometric properties of attitudes scale

Oon Pey Tee 《International Journal of Science Education》2018,40(3):268-290

We report here on a comparative study of middle school students’ attitudes towards science involving three countries: England, Singapore and the U.S.A. Complete attitudinal data sets from TIMSS (Trends in International Mathematics and Science Study) 2011 were used, thus giving a very large sample size (N?=?20,246), compared to other studies in the journal literature. The Rasch model was used to analyse the data, and the findings have shed some useful light on not only how the Western and Asian students responded on a comparative basis in the various scales related to attitudes but also on the validity, reliability, and unidimensionality of the attitudes instrument used in TIMSS 2011. There may be a need for TIMSS test developers to consider doing away with negatively phrased items in the attitudes instrument and phrasing these positively as the Rasch framework shows that response bias is associated with these statements. 相似文献

17.

Dimensional analyses of complex data

Kathy E. Green 《Structural equation modeling》2013,20(1):50-61

In this article, scales constructed using principal components and Rasch measurement methods are compared. The context of the comparison is scale definition under difficult circumstances—when constructs are unclear and sample sizes marginal. Three data sets of increasing complexity and decreasing stability were used. Responses for the least complex data set were dichotomous; the remaining two were polytomous. Results of Rasch and principal components analyses were identical when data were stable and the structure unidimensional. With less stability and more complexity, the defined scales were still similar for the two analytic approaches. Effects of item positions on the scales were noted and are discussed. 相似文献

18.

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees’ Cognitive Skills in Critical Reading

Changjiang Wang Mark J. Gierl 《Journal of Educational Measurement》2011,48(2):165-187

The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT critical reading items and illustrate how the method can be used to promote cognitive diagnostic inferences. The AHM is a psychometric procedure for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components from a cognitive model. The study was conducted in two steps. In step 1, three cognitive models were developed by reviewing selected literature in reading comprehension as well as research related to SAT Critical Reading. Then, the cognitive models were validated by having a sample of students think aloud as they solved each item. In step 2, psychometric analyses were conducted on the SAT critical reading cognitive models by evaluating the model‐data fit between the expected and observed response patterns produced from two random samples of 2,000 examinees who wrote the items. The model that provided best data‐model fit was then used to calculate attribute probabilities for 15 examinees to illustrate our diagnostic testing procedure. 相似文献

19.

The Effect of Differential Motivation on IRT Linking

下载免费PDF全文

Marie‐Anne Mittelhaëuser Anton A. Béguin Klaas Sijtsma 《Journal of Educational Measurement》2015,52(3):339-358

The purpose of this study was to investigate whether simulated differential motivation between the stakes for operational tests and anchor items produces an invalid linking result if the Rasch model is used to link the operational tests. This was done for an external anchor design and a variation of a pretest design. The study also investigated whether a constrained mixture Rasch model could identify latent classes in such a way that one latent class represented high‐stakes responding while the other represented low‐stakes responding. The results indicated that for an external anchor design, the Rasch linking result was only biased when the motivation level differed between the subpopulations to which the anchor items were administered. However, the mixture Rasch model did not identify the classes representing low‐stakes and high‐stakes responding. When a pretest design was used to link the operational tests by means of a Rasch model, the linking result was found to be biased in each condition. Bias increased as percentage of students showing low‐stakes responding to the anchor items increased. The mixture Rasch model only identified the classes representing low‐stakes and high‐stakes responding under a limited number of conditions. 相似文献

20.

A Rasch Analysis of Raven Item Data

《Journal of Experimental Education》2012,80(1):27-32

The Progressive Matrices items require varying degrees of analytical reasoning. Individuals high on the underlying trait measured by the Raven should score high on the test. Latent trait models applied to data of the Raven form provide a useful methodology for examining the tenability of the above hypothesis. In this study the Rasch latent model was applied to investigate the fit of observed performance on Raven items to what was expected by the model for individuals at six different levels of the underlying scale. For the most part the model showed a good fit to the test data. The findings were similar to previous empirical work that has investigated the behavior of Rasch test scores. In three instances, however, the item fit statistic was relatively large. A closer study of the “misfitting” items revealed two items were of extreme difficulty, which is likely to contribute to the misfit. The study raises issues about the use of the Rasch model in instances of small samples. Other issues related to the interpretation of the Rasch model to Raven-type data are discussed. 相似文献