首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
How does the fact that two tests should not be equated manifest itself? This paper addresses this question through the study of the degree to which equating functions fail to exhibit population invariance across subpopulations. Equating fimctions are supposed to be population invariant by definition. But, when two tests are not equatable, it is possible that the linking functions, used to connect the scores of one to the scores of the other, are not invariant across different populations of examinees. While no acceptable equating function is ever completely population invariant, in the situations where equating is usually performed we believe that the dependence of the equating function on the population used to compute it is usually small enough to be ignored. We introduce two root‐mean‐square difference measures of the degree to which the functions used to link two tests computed on different subpopulations differ from the linking function computed for the whole population. We also introduce the system of “parallel‐linear” linking functions for multiple subpopulations and show that, for this system, our measure of population invariance can be computed easily from the standardized mean differences between the scores of the subpopulations on the two tests. For the parallel‐linear case, we develop a correlation‐based upper bound on our measure that holds for all systems of subpopulations. We illustrate these ideas using data from the SAT I and from a concordance study of several combinations of ACT and SAT I scores, In the appendices, we give some theoretical results bearing on the other equating “requirements” of “same construct,”“same reliability” and one aspect of Lord's concept of equity.  相似文献   

2.
A goal for any linking or equating of two or more tests is that the linking function be invariant to the population used in conducting the linking or equating. Violations of population invariance in linking and equating jeopardize the fairness and validity of test scores, and pose particular problems for test‐based accountability programs that require schools, districts, and states to report annual progress on academic indicators disaggregated by demographic group membership. This instructional module provides a comprehensive overview of population invariance in linking and equating and the relevant methodology developed for evaluating violations of invariance. A numeric example is used to illustrate the comparative properties of available methods, and important considerations for evaluating population invariance in linking and equating are presented.  相似文献   

3.
In many educational tests, both multiple‐choice (MC) and constructed‐response (CR) sections are used to measure different constructs. In many common cases, security concerns lead to the use of form‐specific CR items that cannot be used for equating test scores, along with MC sections that can be linked to previous test forms via common items. In such cases, adjustment by minimum discriminant information may be used to link CR section scores and composite scores based on both MC and CR sections. This approach is an innovative extension that addresses the long‐standing issue of linking CR test scores across test forms in the absence of common items in educational measurement. It is applied to a series of administrations from an international language assessment with MC sections for receptive skills and CR sections for productive skills. To assess the linking results, harmonic regression is applied to examine the effects of the proposed linking method on score stability, among several analyses for evaluation.  相似文献   

4.
The College Board's SAT® data are used to illustrate how the score equity assessment (SEA) can help inform the program about equatability. SEA is used to examine whether the content change(s) to the revised new SAT result in differential linking functions across gender groups. Results of population sensitivity analyses are reported on the linkage of the new SAT critical reading (CR) prototype to an old SAT verbal (OV). Based on the criteria used in this study, population invariance was achieved with respect to gender groups.  相似文献   

5.
If the factor structure of a test does not hold over time (i.e., is not invariant), then longitudinal comparisons of standing on the test are not meaningful. In the case of the Wechsler Intelligence Scale for Children‐Third Edition (WISC‐III), it is crucial that it exhibit longitudinal factorial invariance because it is widely used in high‐stakes special education eligibility decisions. Accordingly, the present study analyzed the longitudinal factor structure of the WISC‐III for both configural and metric invariance with a group of 177 students with disabilities tested, on average, 2.8 years apart. Equivalent factor loadings, factor variances, and factor covariances across the retest interval provided evidence of configural and metric invariance. It was concluded that the WISC‐III was measuring the same constructs with equal fidelity across time which allows unequivocal interpretation of score differences as reflecting changes in underlying latent constructs rather than variations in the measurement operation itself. © 2001 John Wiley & Sons, Inc.  相似文献   

6.
Social‐emotional health influences youth developmental trajectories and there is growing interest among educators to measure the social‐emotional health of the students they serve. This study replicated the psychometric characteristics of the Social Emotional Health Survey (SEHS) with a diverse sample of high school students (Grades 9–12; N = 14,171), and determined whether the factor structure was invariant across sociocultural and gender groups. A confirmatory factor analysis (CFA) tested the fit of the previously known factor structure, and then structural equation modeling was used to test invariance across sociocultural and gender groups through multigroup CFAs. Results supported the SEHS measurement model, with full invariance of the SEHS higher‐order structure for all five sociocultural groups. There were no moderate effect size or higher group differences on the overall index for sociocultural or gender groups, which lends support to the eventual development of common norms and universal interpretation guidelines.  相似文献   

7.
Multigroup confirmatory factor analysis (MCFA) is a popular method for the examination of measurement invariance and specifically, factor invariance. Recent research has begun to focus on using MCFA to detect invariance for test items. MCFA requires certain parameters (e.g., factor loadings) to be constrained for model identification, which are assumed to be invariant across groups, and act as referent variables. When this invariance assumption is violated, location of the parameters that actually differ across groups becomes difficult. The factor ratio test and the stepwise partitioning procedure in combination have been suggested as methods to locate invariant referents, and appear to perform favorably with real data examples. However, the procedures have not been evaluated through simulations where the extent and magnitude of a lack of invariance is known. This simulation study examines these methods in terms of accuracy (i.e., true positive and false positive rates) of identifying invariant referent variables.  相似文献   

8.
Score equity assessment (SEA) is introduced, and placed within a fair assessment context that includes differential prediction or fair selection and differential item functioning. The notion of subpopulation invariance of linking functions is central to the assessment of score equity, just as it has been for differential item functioning and differential prediction. Advanced Placement (AP) data are used for illustrative purposes. The use of multiple-choice and constructed response items in AP provides an opportunity to observe a case where subpopulation invariance of linking functions does not hold (U.S. History), and a case in which it does hold (Calculus AB). The lack of invariance for U.S. History might be attributed to several sources. The role of SEA in assessing the fairness of test assembly processes is discussed.  相似文献   

9.
As access and reliance on technology continue to increase, so does the use of computerized testing for admissions, licensure/certification, and accountability exams. Nonetheless, full computer‐based test (CBT) implementation can be difficult due to limited resources. As a result, some testing programs offer both CBT and paper‐based test (PBT) administration formats. In such situations, evidence that scores obtained from different formats are comparable must be gathered. In this study, we illustrate how contemporary statistical methods can be used to provide evidence regarding the comparability of CBT and PBT scores at the total test score and item levels. Specifically, we looked at the invariance of test structure and item functioning across test administration mode across subgroups of students defined by SES and sex. Multiple replications of both confirmatory factor analysis and Rasch differential item functioning analyses were used to assess invariance at the factorial and item levels. Results revealed a unidimensional construct with moderate statistical support for strong factorial‐level invariance across SES subgroups, and moderate support of invariance across sex. Issues involved in applying these analyses to future evaluations of the comparability of scores from different versions of a test are discussed.  相似文献   

10.
This study investigates measurement invariance of the mathematics, science, and ICT scales across the 47 countries that participated in the PISA 2015 ICT Familiarity Questionnaire. Knowing whether the same constructs and measurements can be reliably compared across countries constitutes an important goal. The Alignment method is employed to test the measurement invariance of the three scales. The results show that mathematics and science scores are highly invariant and can be used to compare countries, whereas the ICT scale is mostly non-invariant and cannot be used to reliably compare ICT means across all participating countries. Implications and limitations are discussed.  相似文献   

11.
The concept of invariance in equating and linking is traced from the 1950s to the present. A number of research studies that examined population invariance are reviewed. Theory and research suggest that linkings other than equatings are population dependent. Theory also indicates that equatings are population dependent, although when test forms are built to detailed tables of content and statistical specifications and alternate forms are very similar to one another, the research suggests that equatings might be approximately population invariant. Suggestions are made about further research that should be conducted on methodology for examining population invariance and on empirical research to better understand the conditions under which equatings are sufficiently population invariant for practical purposes.  相似文献   

12.
ABSTRACT

Based on concerns about the item response theory (IRT) linking approach used in the Programme for International Student Assessment (PISA) until 2012 as well as the desire to include new, more complex, interactive items with the introduction of computer-based assessments, alternative IRT linking methods were implemented in the 2015 PISA round. The new linking method represents a concurrent calibration using all available data, enabling us to find item parameters that maximize fit across all groups and allowing us to investigate measurement invariance across groups. Apart from the Rasch model that historically has been used in PISA operational analyses, we compared our method against more general IRT models that can incorporate item-by-country interactions. The results suggest that our proposed method holds promise not only to provide a strong linkage across countries and cycles but also to serve as a tool for investigating measurement invariance.  相似文献   

13.
To date, no effective empirical method has been available to identify a truly invariant reference variable (RV) in testing measurement invariance under a multiple-group confirmatory factor analysis. This study proposes a method that, in selecting an RV, uses the smallest modification index (min-mod). The method’s performance is evaluated using 2 models: (a) a full invariance model, and (b) a partial invariance model. Results indicate that for both models the min-mod successfully identifies a truly invariant RV (Study 1). In Study 2, we use the RV found in Study 1 to further evaluate the performance of item-by-item Wald tests at locating a noninvariant variable. The results indicate that Wald tests overall performed better with an RV selected in a partial invariance model than an RV selected in a full invariance model, although in certain conditions their performances were rather similar. Implications and limitations of the study are also discussed.  相似文献   

14.
We estimated the invariance of educational achievement (EA) and learning attitudes (LA) measures across nations. A multi-group confirmatory factor analysis was used to estimate the invariance of educational achievement and learning attitudes across 55 nations (Programme for International Student Assessment [PISA] 2006 data, N?=?354,203). The constructs had the same meaning (factor loadings) but different scales (intercepts). Our conclusion is that comparisons of the relationships between educational achievement and learning attitudes across countries need to take into consideration two sources of variability: individual differences of students and group differences of educational systems. The lack of scalar invariance in EA and LA measures means that the relationships between EA and LA may have a different meaning at the level of nations and at the student level within countries. In other words, as PISA measures are not invariant in scalar sense, the comparisons across countries with nationally aggregated scores are not justified.  相似文献   

15.
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the existing methods were designed to detect drifts in individual items, which may not be adequate for test characteristic curve–based linking or equating. One example is the item response theory–based true score equating, whose goal is to generate a conversion table to relate number‐correct scores on two forms based on their test characteristic curves. This article introduces a stepwise test characteristic curve method to detect item parameter drift iteratively based on test characteristic curves without needing to set any predetermined critical values. Comparisons are made between the proposed method and two existing methods under the three‐parameter logistic item response model through simulation and real data analysis. Results show that the proposed method produces a small difference in test characteristic curves between administrations, an accurate conversion table, and a good classification of drifted and nondrifted items and at the same time keeps a large amount of linking items.  相似文献   

16.
In order to initiate more research on the role of reading motivation during the initial stages of learning to comprehend texts, we developed the Reading Motivation Questionnaire for Elementary Students (RMQ‐E). The sample comprised 1497 elementary students in Grades 1–3. By means of exploratory and confirmatory factor analyses, three factors were determined: Curiosity, involvement and competition. The three‐factor structure of the RMQ‐E was found to be invariant across grade levels (scalar invariance) and across female and male students (strict invariance). As was anticipated, students in higher grades and male students were lower in curiosity and involvement than students in lower grades and female students. Whereas competitive reading motivation did not differ across grade levels, it was higher for boys than for girls. Moreover, the contributions of involvement and competition to reading amount and reading competence were in accordance with the hypotheses. The predictive validity of curiosity, however, was not confirmed.  相似文献   

17.
Basic Psychological Needs Theory (BPNT) suggests that autonomy‐supportive teachers can promote the satisfaction of students’ three basic psychological needs (i.e., the need for autonomy, competence, and relatedness) and this is essential for optimal functioning and personal well‐being. The role of need satisfaction as a determinant of well‐being is understood to be invariant across contexts and cultures. The aim of this study is to test the invariance in the relationships between students’ perceptions of their teachers’ autonomy support and their psychological need satisfaction, enjoyment, concentration, and boredom across different school subjects (math, English, and physical education lessons) and across different cultures (England and Turkey). Questionnaires tapping the targeted variables in the three different lesson types were completed by students in schools in England and Turkey. Results from multilevel modeling analyses showed some support for the tenets of BPNT, albeit there were inconsistences among the strengths of the hypothesized relationships based on country and/or lesson type.  相似文献   

18.
Structural Equation Modeling (SEM) was used in this study to determine the extent to which teachers, principals, and superintendents perceive the leadership construct in the same way. The researchers found that the two-factor model fits the principal group and particularly the superintendent group better than does the four-factor model. The principals and particularly the superintendents appear to have a more tightly focused mental model of leadership than teachers. The test of structural invariance across the three groups indicated that there was configural and weak invariance, but not strong or strict invariance. It appears that the item-loading patterns and item loadings are invariant, but the intercepts are different, which suggests that the groups put different emphases on the importance of the factors. The study affirms the importance of determining and reporting the extent to which comparison groups share the same mental model for leadership.   相似文献   

19.
The Early Communication Indicator (ECI) is a measure for universal screening, intervention decision-making, progress monitoring for infants and toddlers needing higher levels of support, and program accountability. In the context of the ECI's long-term wide-scale use for these purposes, we examined the invariance of ECI measurement in two samples of the same Early Head Start (EHS) population differing in the years data were collected. Invariance or equivalence across samples is an important step in measurement validation because making inferences assumes that the measurements are factorially invariant. A number of time-covarying factors (e.g., assessors, children, etc.) can be hypothesized as threats to measurement invariance. Results of latent growth curve analyses indicated similarity in the functional forms (velocity and shape) of the ECIs four key skill trajectories between groups of children and ECI vocalizations, single, and multiple words trajectories met strong factorial and structural invariance. Gestures met only weak factorial invariance. ECI total communications, a weighted composite of the four scales, also met both strong factorial and structural invariance. With one exception, results indicated that the ECI produced comparable growth estimates over different conditions of programs, assessors, and children over time, strengthening the construct validity of the ECI. Implications are discussed.  相似文献   

20.
Research Findings: Public policy has increasingly focused on expansion of preschool access for underserved students and systematic evaluation of preschool quality and students’ readiness for school. However, such evaluation is limited by a lack of thoroughly validated assessments for use with preschool populations. The present study examined the measurement and structural invariance of the Kindergarten Student Entrance Profile (KSEP) across kindergarten and prekindergarten groups to evaluate its potential use across developmental groups. Participants included 522 kindergarten and 548 prekindergarten students in central California. Invariance was tested by fitting a series of multiple-groups confirmatory factor analysis models with parameter constraints across groups. Results indicated that measurement and structural parameters of the KSEP were invariant across kindergarten and prekindergarten groups. Prekindergarten means on both Social–Emotional Readiness and Cognitive Readiness were significantly lower than kindergarten means. Practice or Policy: These results suggest that the KSEP may potentially be used with prekindergarten students to assess school readiness and inform intervention before kindergarten entry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号