共查询到10条相似文献,搜索用时 0 毫秒
1.
States participating in the Growth Model Pilot Program reference individual student growth against “proficiency” cut scores that conform with the original No Child Left Behind Act (NCLB). Although achievement results from conventional NCLB models are also cut‐score dependent, the functional relationships between cut‐score location and growth results are more complex and are not currently well described. We apply cut‐score scenarios to longitudinal data to demonstrate the dependence of state‐ and school‐level growth results on cut‐score choice. This dependence is examined along three dimensions: 1) rigor, as states set cut scores largely at their discretion, 2) across‐grade articulation, as the rigor of proficiency standards may vary across grades, and 3) the time horizon chosen for growth to proficiency. Results show that the selection of plausible alternative cut scores within a growth model can change the percentage of students “on track to proficiency” by more than 20 percentage points and reverse accountability decisions for more than 40% of schools. We contribute a framework for predicting these dependencies, and we argue that the cut‐score dependence of large‐scale growth statistics must be made transparent, particularly for comparisons of growth results across states. 相似文献
2.
Standard-setting studies utilizing procedures such as the Bookmark or Angoff methods are just one component of the complete standard-setting process. Decision makers ultimately must determine what they believe to be the most appropriate standard or cut score to use, employing the input of the standard-setting panelists as one piece of information among multiple sources. However, guidance for weighing the various components is limited. The current article describes considerations about data that are used to make standard-setting decisions, as previously outlined by Geisinger (1991) . The ten points provided by Geisinger have been expanded as they relate to shifts in educational policy and practice in educational measurement. They have been amended with six new components as well. The new considerations addressed are smoothing across grades, raising standards in progression (over grades or over time), opportunity to learn or instructional validity, input from other groups, equating or linking to previous standards, and organizational vision and goals . 相似文献
3.
Some writers in the measurement literature have been skeptical of the meaningfulness of achievement standards and described the standard-setting process as blatantly arbitrary. We argue that standard setting is more appropriately conceived of as a measurement process similar to student assessment. The construct being measured is the panelists' representation of student performance at the threshold of an achievement level. In the first section of this paper, we argue that standard setting is an example of stimulus-centered measurement. In the second section, we elaborate on this idea by comparing some popular standard-setting methods to the stimulus-centered scaling methods known as psychophysical scaling. In the third section, we use the lens of standard setting as a measurement process to take a fresh look at the two criticisms of standard setting: the role of judgment and the variability of results. In the fourth section, we offer a vision of standard-setting research and practice as grounded in the theory and practice of educational measurement . 相似文献
4.
In this article we address the issue of consistency in standard setting in the context of an augmented state testing program. Information gained from the external NRT scores is used to help make an informed decision on the determination of cut scores on the state test. The consistency of cut scores on the CRT across grades is maintained by forcing a consistency model based on the NRT scores and translating that information back to the CRT scores. The inconsistency of standards and the application of this model are illustrated using data from the Maryland MSA large state testing program involving cut points for basic, proficient and advanced in mathematics and reading across years and across grades. The model is discussed in some detail and shown to be a promising approach, although not without assumptions that must be made and issues that might be raised. 相似文献
5.
6.
7.
8.
In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can lead to questionable results in setting cut scores. In this study, data collected from a previous standard-setting study are used to deduce panelists’ conceptions of profiles of borderline performance. These profiles are then used to predict cut scores on a test of algebra readiness. The results indicate that these profiles can predict a very wide range of cut scores both within and between panelists. Modifications are proposed to existing training procedures for test-centered methods that can account for the variation in borderline profiles. 相似文献
9.
10.
杨志明 《教育测量与评价(理论版)》2019,(3):15-18
英语高考试行"一年多考"是一项了不起的进步,但多次考试之间的难度波动往往会给直接使用原始分数做招生决定带来极大的麻烦。本文探讨了稳定测验难度的三种方法:国际考试行业的标准做法、借用标准设定思想的专家评定方法,以及反向使用效度证据的小规模代表性样本试测方法。期待这些方法可以给考试一线工作者提供更多的选择。 相似文献