ABSTRACTBased on concerns about the item response theory (IRT) linking approach used in the Programme for International Student Assessment (PISA) until 2012 as well as the desire to include new, more complex, interactive items with the introduction of computer-based assessments, alternative IRT linking methods were implemented in the 2015 PISA round. The new linking method represents a concurrent calibration using all available data, enabling us to find item parameters that maximize fit across all groups and allowing us to investigate measurement invariance across groups. Apart from the Rasch model that historically has been used in PISA operational analyses, we compared our method against more general IRT models that can incorporate item-by-country interactions. The results suggest that our proposed method holds promise not only to provide a strong linkage across countries and cycles but also to serve as a tool for investigating measurement invariance. 相似文献
Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained.
This paper analyses 216 science items of key stage 2 tests which are national sampling assessments administered to 11 year olds in England. Potential predictors (topic, subtopic, concept, question type, nature of stimulus, depth of knowledge and linguistic variables) were considered in the analysis. Coding frameworks employed in similar studies were adapted and employed by two coders to independently rate items. Linguistic demands were gauged using a computational linguistic facility. The stepwise regression models predicted 23% of the variance with extended constructed questions and photos being the main predictors of item difficulty.
While a substantial part of unexplained variance could be attributed to the unpredictable interaction of variables, we argue that progress in this area requires improvement in the theories and the methods employed. Future research needs to be centred on improving coding frameworks as well as developing systematic training protocols for coders. These technical advances would pave the way to improved task design and reduced development costs of assessments. 相似文献
This pilot study measures university students’ perceptions of graded frequent assessments in an obligatory statistics course using a novel questionnaire. Relations between perceptions of frequent assessments, intrinsic motivation and grades were also investigated. A factor analysis of the questionnaire revealed four factors, which were labelled value, formative function, positive effects and negative effects. The results showed that most students valued graded frequent assessments as a study motivator. A modest number of students experienced positive or negative effects from assessments and grades received. Less than half of the students used the results of frequent assessments in their learning process. The perception of negative effects (lower self-confidence and more stress) negatively mediated the relation between grades and intrinsic motivation. It is argued that communication with students regarding the purpose and benefits of frequent assessments could mitigate these negative effects. 相似文献