Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained.
This paper analyses 216 science items of key stage 2 tests which are national sampling assessments administered to 11 year olds in England. Potential predictors (topic, subtopic, concept, question type, nature of stimulus, depth of knowledge and linguistic variables) were considered in the analysis. Coding frameworks employed in similar studies were adapted and employed by two coders to independently rate items. Linguistic demands were gauged using a computational linguistic facility. The stepwise regression models predicted 23% of the variance with extended constructed questions and photos being the main predictors of item difficulty.
While a substantial part of unexplained variance could be attributed to the unpredictable interaction of variables, we argue that progress in this area requires improvement in the theories and the methods employed. Future research needs to be centred on improving coding frameworks as well as developing systematic training protocols for coders. These technical advances would pave the way to improved task design and reduced development costs of assessments. 相似文献
The purpose of the current study was to examine the validity and diagnostic accuracy of the Intervention Selection Profile—Social Skills (ISP‐SS), a brief social skills assessment tool intended for use with students in need of Tier 2 intervention. Participants included 160 elementary and middle school students who had been identified through universal screening as at risk for behavioral concerns. Teacher participants ( n = 71) rated each of these students using both the ISP‐SS and the Social Skills Improvement System—Rating Scales (SSiS‐RS), with the latter measure serving as the criterion within validity and diagnostic accuracy analyses. Confirmatory factor analysis supported ISP‐SS structural validity, indicating ISP‐SS items broadly conformed to a single “Social Skills” factor. Follow‐up analyses suggested ISP‐SS broad scale scores demonstrated adequate internal consistency reliability, with hierarchical omega coefficient equal to 0.86. Correlational analyses supported the concurrent validity of ISP‐SS items, finding each ISP‐SS item to be moderately or highly related to its corresponding SSiS‐RS subscale. Finally, analyses indicated three of the seven ISP‐SS items that demonstrated sufficient diagnostic accuracy; however, findings suggest additional revisions are needed if the ISP‐SS is to be appropriate for use in schools. Implications for practice and future research are discussed. 相似文献
We examined the responses of 58,288 college students to 8 scales involving 53 items from the National Survey of Student Engagement (NSSE) to gauge whether individuals respond differently to surveys administered via the Web and paper. Multivariate regression analyses indicated that mode effects were generally small. However, students who completed the Web-based survey responded more favorably than paper on all 8 scales. These patterns generally held for both women and men, and younger and older students. Interestingly, the largest effect was found for a scale of items involving computing and information technology. 相似文献