首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Whether hierarchical logistic regression can reduce the sample size requirement for estimating optimal cutoff scores in a course placement service where predictive validity is measured by a threshold utility function is explored. Data from courses with varying class size were randomly partitioned into two halves per course. Non-hierarchical and hierarchical analyses were performed on each half. Compared to their nonhierarchical counterparts, hierarchically estimated cutoff scores from different halves were more stable and predicted course outcomes in the other half more accurately. These differences were most pronounced with small samples. Sample size requirements for developing cutoff scores for course placement can be substantially reduced if hierarchical logistic regression is used.  相似文献   

2.
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF using a combination of the decisions made with BD and MH to those of LR. The results revealed that while the Type I error rate of CDR was consistently below the nominal alpha level, the Type I error rate of LR was high for the conditions having unequal ability distributions. In addition, the power of CDR was consistently higher than that of LR across all forms of DIF.  相似文献   

3.
Although logistic regression became one of the well‐known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying note on those three tests when logistic regression is applied for DIF detection.  相似文献   

4.
《教育实用测度》2013,26(4):329-349
The logistic regression (LR) procedure for differential item functioning (DIF) detection is a model-based approach designed to identify both uniform and nonuniform DIF. However, this procedure tends to produce inflated Type I errors. This outcome is problematic because it can result in the inefficient use of testing resources, and it may interfere with the study of the underlying causes of DIF. Recently, an effect size measure was developed for the LR DIF procedure and a classification method was proposed. However, the effect size measure and classification method have not been systematically investigated. In this study, we developed a new classification method based on those established for the Simultaneous Item Bias Test. A simulation study also was conducted to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure across sample sizes, ability distributions, and percentage of DIF items included on a test. The results indicate that the inclusion of the effect size measure can substantially reduce Type I error rates when large sample sizes are used, although there is also a reduction in power.  相似文献   

5.
The idea that test scores may not be valid representations of what students know, can do, and should learn next is well known. Person fit provides an important aspect of validity evidence. Person fit analyses at the individual student level are not typically conducted and person fit information is not communicated to educational stakeholders. In this study, we focus on a promising method for detecting and conveying person fit for large-scale educational assessments. This method uses multilevel logistic regression (MLR) to model the slopes of the person response functions, a potential source of person misfit for IRT models. We apply the method to a representative sample of students who took the writing section of the SAT (N = 19,341). The findings suggest that the MLR approach is useful for providing supplemental evidence of model–data fit in large-scale educational test settings. MLR can be useful for detecting general misfit at global and individual levels. However, as with other model–data fit indices, the MLR approach is limited in providing information regarding only some types of person misfit.  相似文献   

6.
The purpose of this article is to present logistic discriminant function analysis as a means of differential item functioning (DIF) identification of items that are polytomously scored. The procedure is presented with examples of a DIF analysis using items from a 27-item mathematics test which includes six open-ended response items scored polytomously. The results show that the logistic discriminant function procedure is ideally suited for DIF identification on nondichotomously scored test items. It is simpler and more practical than polytomous extensions of the logistic regression DIF procedure and appears to fee more powerful than a generalized Mantel-Haenszelprocedure.  相似文献   

7.
Abstract

Researchers are often reluctant to rely on classification rates because a model with favorable classification rates but poor separation may not replicate well. In comparison, entropy captures information about borderline cases unlikely to generalize to the population. In logistic regression, the correctness of predicted group membership is known, however, this information has not yet been utilized in entropy calculations. The purpose of this study was to, 1) introduce three new variants of entropy as approximate-model-fit measures, 2) establish rule-of-thumb thresholds to determine whether a theoretical model fits the data, and 3) investigate empirical Type I error and statistical power associated with those thresholds. Results are presented from two Monte Carlo simulations. Simulation results indicated that EFR-rescaled was the most representative of overall model effect size, whereas EFR provided the most intuitive interpretation for all group size ratios. Empirically-derived thresholds are provided.  相似文献   

8.
9.
Direct survey techniques deal with collecting information on sensitive issues data, such as induced abortion, drug addiction, and so on. RR (randomized response) techniques are available for many interviewees, who do not feel comfortable to disclose their personal data due to privacy risks. RR techniques are used in the estimation of the number of people having a sensitive attribute say A. When the research is conducted on the disgraceful or ignominious characteristics of persons like rash driving, tax elusion, induced abortion, testing HIV (human immunodeficiency virus) positive etc., RR techniques are used to make sure that the estimates obtained are efficient and unbiased. During these types of surveys, privacy of the respondent is also managed. Among others, the conflict between efficiency and protection of privacy was also discussed by Nayak in 1994. In RR-related techniques, the SRS (simple random sampling) is statistically used in the sample selection. In this paper, RR procedure is used that allows us to estimate the population proportion in addition to the probability of providing a truthful answer. This study also quantifies a method for the estimation of the model having one variable (univariate) while studying logistic regression, where the dependent variables are subject to RR. In addition, an efficiency comparison is carried out to investigate the performance of the proposed technique. It is also assumed that during the study, the respondents will respond keeping in view the instructions of the RR design. The general idea about findings of current study, though, is so as to perform RR techniques comparatively fine.  相似文献   

10.
Diversity and heterogeneity among language groups have been well documented. Yet most fairness research that focuses on measurement comparability considers linguistic minority students such as English language learners (ELLs) or Francophone students living in minority contexts in Canada as a single group. Our focus in this research is to examine the degree to which measurement comparability, as indicated by differential item functioning (DIF), is consistent for sub-groups among linguistic minority Francophone students in Canada. The findings suggest that the linguistic minority Francophone students who speak French at home and those who do not speak French at home should not be grouped together for investigating measurement comparability or for examining performance gaps. We identified a great degree of differences in DIF identification with a consistency of 7–10% in DIF identification in the separate analyses for the two groups. The findings highlight methodological problems with investigating fairness for diverse linguistic groups that are treated as a single group.  相似文献   

11.
Multilevel modeling (MLM) is a popular way of assessing mediation effects with clustered data. Two important limitations of this approach have been identified in prior research and a theoretical rationale has been provided for why multilevel structural equation modeling (MSEM) should be preferred. However, to date, no empirical evidence of MSEM's advantages relative to MLM approaches for multilevel mediation analysis has been provided. Nor has it been demonstrated that MSEM performs adequately for mediation analysis in an absolute sense. This study addresses these gaps and finds that the MSEM method outperforms 2 MLM-based techniques in 2-level models in terms of bias and confidence interval coverage while displaying adequate efficiency, convergence rates, and power under a variety of conditions. Simulation results support prior theoretical work regarding the advantages of MSEM over MLM for mediation in clustered data.  相似文献   

12.
The present study investigates the phenomena of simultaneous DIF amplification and cancellation and SIBTEST's role in detecting such. A variety of simulated test data were generated for this purpose. In addition, real test data from various sources were analyzed. The results from both simulated and real test data, as Sheafy and Stout's theory (1993a, 1993b) suggests, show that the SIBTEST is effective in assessing DIF amplification and cancellation (partially or fully) at the test score level. Finally, methodological and substantive implications of DIF amplification and cancellation are discussed.  相似文献   

13.
Data from a large-scale performance assessment ( N = 105,731) were analyzed with five differential item functioning (DIF) detection methods for polytomous items to examine the congruence among the DIF detection methods. Two different versions of the item response theory (IRT) model-based likelihood ratio test, the logistic regression likelihood ratio test, the Mantel test, and the generalized Mantel–Haenszel test were compared. Results indicated some agreement among the five DIF detection methods. Because statistical power is a function of the sample size, the DIF detection results from extremely large data sets are not practically useful. As alternatives to the DIF detection methods, four IRT model-based indices of standardized impact and four observed-score indices of standardized impact for polytomous items were obtained and compared with the R 2 measures of logistic regression.  相似文献   

14.
In this article we present a general approach not relying on item response theory models (non‐IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non‐IRT approach, NLR can be seen as a proxy of detection based on the three‐parameter IRT model which is a standard tool in the study field. Hence, NLR fills a logical gap in DIF detection methodology and as such is important for educational purposes. Moreover, the advantages of the NLR procedure as well as comparison to other commonly used methods are demonstrated in a simulation study. A real data analysis is offered to demonstrate practical use of the method.  相似文献   

15.
Accounting for Aberrant Test Response Patterns Using Multilevel Models   总被引:1,自引:0,他引:1  
Hypotheses about aberrant test-response behavior and hence invalid person-measurement have hitherto included factors like ability, gender, language, test-anxiety, and motivation, but these have not previously been collectively investigated with real data, or with multilevel models. This study analyzes the effect of these factors on person aberrance using a real mathematics assessment data set under the framework of a two-level (person and classroom) hierarchical model. The results suggest that higher-scoring pupils, and, to a lesser extent, second-language learners are significantly more often aberrant. But more importantly, we find that the classroom makes a significant contribution to person aberrance and conclude that studies that investigate the sources of person aberrance with real data should model the classroom as well as individual levels.  相似文献   

16.
Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content, cognitive demands, item type, item text, and visual-spatial or reference factors. To facilitate the analyses, DIF studies were conducted at 3 grade levels and for 2 randomly equivalent forms of the science assessment at each grade level (administered in different years). The DIF procedure itself was a variant of the "standardization procedure" of Dorans and Kulick (1986) and was applied to very large sets of data (6 sets of data, each involving 60,000 students). It has the advantages of being easy to understand and to explain to practitioners. Several findings emerged from the study that would be useful to pass on to test development committees. For example, when there was DIF in science items, MC items tended to favor male examinees and OR items tended to favor female examinees. Compiling DIF information across multiple grades and years increases the likelihood that important trends in the data will be identified and that item writing practices will be informed by more than anecdotal reports about DIF.  相似文献   

17.
Measurement bias can be detected using structural equation modeling (SEM), by testing measurement invariance with multigroup factor analysis (Jöreskog, 1971;Meredith, 1993;Sörbom, 1974) MIMIC modeling (Muthén, 1989) or restricted factor analysis (Oort, 1992,1998). In educational research, data often have a nested, multilevel structure, for example when data are collected from children in classrooms. Multilevel structures might complicate measurement bias research. In 2-level data, the potentially “biasing trait” or “violator” can be a Level 1 variable (e.g., pupil sex), or a Level 2 variable (e.g., teacher sex). One can also test measurement invariance with respect to the clustering variable (e.g., classroom). This article provides a stepwise approach for the detection of measurement bias with respect to these 3 types of violators. This approach works from Level 1 upward, so the final model accounts for all bias and substantive findings at both levels. The 5 proposed steps are illustrated with data of teacher–child relationships.  相似文献   

18.
Differential Item Functioning (DIF) is traditionally used to identify different item performance patterns between intact groups, most commonly involving race or sex comparisons. This study advocates expanding the utility of DIF as a step in construct validation. Rather than grouping examinees based on cultural differences, the reference and focal groups are chosen from two extremes along a distinct cognitive dimension that is hypothesized to supplement the dominant latent trait being measured. Specifically, this study investigates DIF between proficient and non-proficient fourth- and seventh-grade writers on open-ended mathematics test items that require students to communicate about mathematics. It is suggested that the occurrence of DIF in this situation actually enhances, rather than detracts from, the construct validity of the test because, according to the National Council of Teachers of Mathematics (NCTM), mathematical communication is an important component of mathematical ability, the dominant construct being assessed. However, the presence of DIF influences the validity of inferences that can be made from test scores and suggests that two scores should be reported, one for general mathematical ability and one for mathematical communication. The fact that currently only one test score is reported, a simple composite of scores on multiple-choice and open-ended items, may lead to incorrect decisions being made about examinees.  相似文献   

19.
Most currently accepted approaches for identifying differentially functioning test items compare performance across groups after first matching examinees on the ability of interest. The typical basis for this matching is the total test score. Previous research indicates that when the test is not approximately unidimensional, matching using the total test score may result in an inflated Type I error rate. This study compares the results of differential item functioning (DIF) analysis with matching based on the total test score, matching based on subtest scores, or multivariate matching using multiple subtest scores. Analysis of both actual and simulated data indicate that for the dimensionally complex test examined in this study, using the total test score as the matching criterion is inappropriate. The results suggest that matching on multiple subtest scores simultaneously may be superior to using either the total test score or individual relevant subtest scores.  相似文献   

20.
One of the most consistent themes evident in the literature dealing with rural education is that of rural disadvantage. Much research and literature indicates that students from rural schools receive an education that is inferior to that of students from larger urban or suburban schools. Of the matrix of factors reported to lead to that disadvantage, geographical isolation and the extent to which it restricts access is reported to result in rural schools not having the same standard of resource allocation as urban schools where access is not a problem. This study addresses the issue of resource availability in rural and urban Australian schools and includes the variables: students' attitudes towards science and mathematics and career aspirations of these students. The analysis includes socioeconomic status and gender of these students and investigates how these variables relate to student achievement. Do students in rural schools have the same educational opportunity as students in urban schools? In this study a multilevel model is used which takes into account the classroom level variance in student achievement as well as individual variance and school level variance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号