首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The reading data from the 1983–84 National Assessment of Educational Progress survey were scaled using a unidimensional item response theory model. To determine whether the responses to the reading items were consistent with unidimensionality, the full-information factor analysis method developed by Bock and associates (1985) and Rosenbaum's (1984) test of unidimensionality, conditional (local) independence, and monotonicity were applied. Full-information factor analysis involves the assumption of a particular item response function; the number of latent variables required to obtain a reasonable fit to the data is then determined. The Rosenbaum method provides a test of the more general hypothesis that the data can be represented by a model characterized by unidimensionality, conditional independence, and monotonicity. Results of both methods indicated that the reading items could be regarded as measures of a single dimension. Simulation studies were conducted to investigate the impact of balanced incomplete block (BIB) spiraling, used in NAEP to assign items to students, on methods of dimensionality assessment. In general, conclusions about dimensionality were the same for BIB-spiraled data as for complete data.  相似文献   

2.
Utilizing the National Assessment of Educational Progress (NAEP) data, this study examined (1) how fourth and eighth-grade ELLs' mathematics and reading scores on national tests compared to their non-ELL peers' scores over the testing period between 2003 and 2011, and (2) if gender and ethnicity contributed to variation in the growth patterns among the student groups across grade levels and content areas. Since the NAEP data, which provides a national sample of 10,000–20,000 students, is collected using a probability sample design, sampling weights are adjusted so inferences can be appropriately made. Sample sizes within NAEP are large enough to generate adequate power for statistical significance. Thus, to display the data in a multivariate mode, Tableau 8.0.0 software was used. Results suggested that the achievement gap between non-ELLs and ELLs is either steady or slightly widening in both mathematics and reading, with multiple paths across the content areas, grade levels, and gender and ethnic groups.  相似文献   

3.
This Monte Carlo study examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure. Data were generated using a 3-parameter logistic item response theory model according to the balanced incomplete block (BIB) design used in the National Assessment of Educational Progress (NAEP). The length of each block of items and the number of DIF items in the matching variable were varied, as was the difficulty, discrimination, and presence of DIF in the studied item. Block, booklet, pooled booklet, and extra-information analyses were compared to a complete data analysis using the transformed log-odds on the delta scale. The pooled booklet approach is recommended for use when items are selected for examinees according to a BIB design. This study has implications for DIF analyses of other complex samples of items, such as computer administered testing or another complex assessment design.  相似文献   

4.
《Educational Assessment》2013,18(3):225-253
Because of plans for state-by-state reporting of 1992 reading data from the National Assessment of Educational Progress (NAEP), we investigated the adequacy of the process used to develop the assessment, the degree to which it represents a consensus among professionals in the reading field, and its content and curricular validity. To carry out this investigation, we analyzed documents produced by NAEP, convened a 2-day panel of experts, held two public colloquia, conducted 50 interviews, and analyzed responses to a questionnaire completed by 627 leading educators. We found that the planning process did not include enough time to address some major concerns of the field. Despite this, there was widespread agreement that the 1992 NAEP in Reading represents important advances in reading assessment, including more open-ended responses, more authentic texts, and student choice about passages. But these very advances raise problems for test design and the interpretation and scoring of student responses.  相似文献   

5.
Using Muraki's (1992) generalized partial credit IRT model, polytomous items (responses to which can be scored as ordered categories) from the 1991 field test of the NAEP Reading Assessment were calibrated simultaneously with multiple-choice and short open-ended items. Expected information of each type of item was computed. On average, four-category polytomous items yielded 2.1 to 3.1 times as much IRT information as dichotomous items. These results provide limited support for the ad hoc rule of weighting k-category polytomous items the same as k - 1 dichotomous items for computing total scores. Polytomous items provided the most information about examinees of moderately high proficiency; the information function peaked at 1.0 to 1.5, and the population distribution mean was 0. When scored dichotomously, information in polytomous items sharply decreased, but they still provided more expected information than did the other response formats. For reference, a derivation of the information function for the generalized partial credit model is included.  相似文献   

6.
This article reports two studies to illustrate methodologies for conducting a conditional covariance-based nonparametric dimensionality assessment using data from two forms of the Test of English as a Foreign Language (TOEFL). Study 1 illustrates how to assess overall dimensionality of the TOEFL including all three subtests. Study 2 is aimed at illustrating how to conduct dimensionality analyses for a testlet-based test by focusing on the Reading Comprehension (RC) section in combination with item content analyses and hypothesis testing. The results of Study 1 indicated that both TOEFL forms involve two dominant dimensions corresponding to the Listening Comprehension section and the combination of the Reading Comprehension section and Structure and Written Expression section. The extensive RC analyses from Study 2 revealed strong evidence that a significant amount of the RC multidimensionality came from testlet effects. Confirmatory analyses coupled with exploratory cluster analyses and substantive item content analyses further identified dimensionality structure having to do with reading subskills.  相似文献   

7.
Data from the North Carolina End-of-Grade test of eighth-grade mathematics are used to estimate the achievement results on the scale of the National Assessment of Educational Progress (NAEP) Trial State Assessment. Linear regression models are used to develop projection equations to predict state NAEP results in the future, and the results of such predictions are compared with those obtained in the 1996 administration of NAEP Standard errors of the parameter estimates are obtained using a bootstrap resampling technique.  相似文献   

8.
9.
This article provides an overview of the consensus processes for the development of the frameworks underlying the NAEP assessments, with emphasis on those for the 1990 and 1992 assessments of mathematics, the 1992 assessment of reading, and the 1994 assessment of science. In addition, innovative assessment techniques included in the 1992 assessments of mathematics, reading, and writing are described, including use of mathematics tools, oral interviews, and portfolio assessment.  相似文献   

10.
DIMTEST is a widely used and studied method for testing the hypothesis of test unidimensionality as represented by local item independence. However, DIMTEST does not report the amount of multidimensionality that exists in data when rejecting its null. To provide more information regarding the degree to which data depart from unidimensionality, a DIMTEST-based Effect Size Measure (DESM) was formulated. In addition to detailing the development of the DESM estimate, the current study describes the theoretical formulation of a DESM parameter. To evaluate the efficacy of the DESM estimator according to test length, sample size, and correlations between dimensions, Monte Carlo simulations were conducted. The results of the simulation study indicated that the DESM estimator converged to its parameter as test length increased, and, as desired, its expected value did not increase with sample size (unlike the DIMTEST statistic in the case of multidimensionality). Also as desired, the standard error of DESM decreased as sample size increased.  相似文献   

11.
Bock, Muraki, and Pfeiffenberger (1988) proposed a dichotomous item response theory (IRT) model for the detection of differential item functioning (DIF), and they estimated the IRT parameters and the means and standard deviations of the multiple latent trait distributions. This IRT DIF detection method is extended to the partial credit model (Masters, 1982; Muraki, 1993) and presented as one of the multiple-group IRT models. Uniform and non-uniform DIF items and heterogeneous latent trait distributions were used to generate polytomous responses of multiple groups. The DIF method was applied to this simulated data using a stepwise procedure. The standardized DIF measures for slope and item location parameters successfully detected the non-uniform and uniform DIF items as well as recovered the means and standard deviations of the latent trait distributions.This stepwise DIF analysis based on the multiple-group partial credit model was then applied to the National Assessment of Educational Progress (NAEP) writing trend data.  相似文献   

12.
In low-stakes assessments, some students may not reach the end of the test and leave some items unanswered due to various reasons (e.g., lack of test-taking motivation, poor time management, and test speededness). Not-reached items are often treated as incorrect or not-administered in the scoring process. However, when the proportion of not-reached items is high, these traditional approaches may yield biased scores and thereby threatening the validity of test results. In this study, we propose a polytomous scoring approach for handling not-reached items and compare its performance with those of the traditional scoring approaches. Real data from a low-stakes math assessment administered to second and third graders were used. The assessment consisted of 40 short-answer items focusing on addition and subtraction. The students were instructed to answer as many items as possible within 5 minutes. Using the traditional scoring approaches, students’ responses for not-reached items were treated as either not-administered or incorrect in the scoring process. With the proposed scoring approach, students’ nonmissing responses were scored polytomously based on how accurately and rapidly they responded to the items to reduce the impact of not-reached items on ability estimation. The traditional and polytomous scoring approaches were compared based on several evaluation criteria, such as model fit indices, test information function, and bias. The results indicated that the polytomous scoring approaches outperformed the traditional approaches. The complete case simulation corroborated our empirical findings that the scoring approach in which nonmissing items were scored polytomously and not-reached items were considered not-administered performed the best. Implications of the polytomous scoring approach for low-stakes assessments were discussed.  相似文献   

13.
14.
Abstract

The National Assessment of Educational Progress (NAEP) requires reading comprehension processes that may be increased by students' amount of engaged reading, parental education, and gender, along with balanced reading instruction and opportunity to read. To examine the effects of those variables on reading achievement and engagement, the authors analyzed the 1994 Grade 4 Maryland NAEP with hierarchical linear modeling to construct both between-school and between-teacher models. Amount of engaged reading significantly predicted reading achievement on the NAEP, after parental education was statistically controlled. Balanced reading instruction significantly predicted reading achievement after accounting for students' engaged reading and parental education. Findings confirmed expectations from the proposed theoretical perspective on reading engagement. Policy implications included an emphasis on some instructional variables in the reading engagement model.  相似文献   

15.
This study compares the Rasch item fit approach for detecting multidimensionality in response data with principal component analysis without rotation using simulated data. The data in this study were simulated to represent varying degrees of multidimensionality and varying proportions of items representing each dimension. Because the requirement of unidimensionality is necessary to preserve the desirable measurement properties of Rasch models, useful ways of testing this requirement must be developed. The results of the analyses indicate that both the principal component approach and the Rasch item fit approach work in a variety of multidimensional data structures. However, each technique is unable to detect multidimensionality in certain combinations of the level of correlation between the two variables and the proportion of items loading on the two factors. In cases where the intention is to create a unidimensional structure, one would expect few items to load on the second factor and the correlation between the factors to be high. The Rasch item fit approach detects dimensionality more accurately in these situations.  相似文献   

16.
The U.S. Department of Education measures student achievement through the National Assessment of Educational Progress (NAEP). NAEP estimates of population proficiency quantiles are based on a Bayesian multiple-imputation procedure. This article shows (a) that the resulting estimates depend directly on the mix of item difficulties on the test, and (b) the difficulty of items on the NAEP mathematics exam has increased over time. Does the increasing difficulty of the exam lead to observable changes in student performance over time? This study compared the simulated performance of 1990 examinees on the easier 1990 exam and the more difficult 1996 exam. No significant differences were found. While our results instill confidence that these changes have not impacted the NAEP trend line, our findings are both data-specific and limited in scope, and NAEP should carefully evaluate future adjustments to the test in this manner.  相似文献   

17.
The present study conducted a systematic review of the item response theory (IRT) literature in language assessment to investigate the conceptualization and operationalization of the dimensionality of language ability. Sixty-two IRT-based studies published between 1985 and 2020 in language assessment and educational measurement journals were first classified into two categories based on a unidimensional and multidimensional research framework, and then reviewed to examine language dimensionality from technical and substantive perspectives. It was found that 12 quantitative techniques were adopted to assess language dimensionality. Exploratory factor analysis was the primary method of dimensionality analysis in papers that had applied unidimensional IRT models, whereas the comparison modeling approach was dominant in the multidimensional framework. In addition, there was converging evidence within the two streams of research supporting the role of a number of factors such as testlets, language skills, subskills, and linguistic elements as sources of multidimensionality, while mixed findings were reported for the role of item formats across research streams. The assessment of reading, listening, speaking, and writing skills was grounded within both unidimensional and multidimensional framework. By contrast, vocabulary and grammar knowledge was mainly conceptualized as unidimensional. Directions for continued inquiry and application of IRT in language assessment are provided.  相似文献   

18.
The development of the DETECT procedure marked an important advancement in nonparametric dimensionality analysis. DETECT is the first nonparametric technique to estimate the number of dimensions in a data set, estimate an effect size for multidimensionality, and identify which dimension is predominantly measured by each item. The efficacy of DETECT critically depends on accurate, minimally biased estimation of the expected conditional covariances of all the item pairs. However, the amount of bias in the DETECT estimator has been studied only in a few simulated unidimensional data sets. This is because the value of the DETECT population parameter is known to be zero for this case and has been unknown for cases when multidimensionality is present. In this article, integral formulas for the DETECT population parameter are derived for the most commonly used parametric multidimensional item response theory model, the Reckase and McKinley model. These formulas are then used to evaluate the bias in DETECT by positing a multidimensional model, simulating data from the model using a very large sample size (to eliminate random error), calculating the large-sample DETECT statistic, and finally calculating the DETECT population parameter to compare with the large-sample statistic. A wide variety of two- and three-dimensional models, including both simple structure and approximate simple structure, were investigated. The results indicated that DETECT does exhibit statistical bias in the large-sample estimation of the item-pair conditional covariances; but, for the simulated tests that had 20 or more items, the bias was small enough to result in the large-sample DETECT almost always correctly partitioning the items and the DETECT effect size estimator exhibiting negligible bias.  相似文献   

19.
This study used an analysis of variance (ANOVA)-like approach to predict reading proficiency with student, teacher, and school-level predictors based on a 3-level hierarchical generalized linear model (HGLM) analysis. National Assessment of Educational Progress (NAEP) 2000 reading data for 4th graders sampled from 46 states of the United States of America were used. The study found that both the rich and poor minority students in a rich school benefited the most in reading performance, whereas rich and average socioeconomic status non-minority students in a rich school being taught in the non-crowded classroom achieved considerably high in reading proficiency. Based on the 3-level HGLM analysis, the ANOVA-like approach enabled the researchers to predict reading proficiency and interpret predictors' effects in a simple fashion.  相似文献   

20.
The purpose of this study was to examine the behavior of 8 measures of fit used to evaluate confirmatory factor analysis models. This study employed Monte Carlo simulation to determine to what extent sample size, model size, estimation procedure, and level of nonnormality affected fit when polytomous data were analyzed. The 3 indexes least affected by the design conditions were the comparative fit index, incremental fit index, and nonnormed fit index, which were affected only by level of nonnormality. The measure of centrality was most affected by the design variables, with values of n2>. 10 for sample size, model size, and level of nonnormality and interaction effects for Model Size x Level of Nonnormality and Estimation x Level of Nonnormality. Findings from this study should alert applied researchers to exercise caution when evaluating model fit with nonnormal, polytomous data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号