首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Educational Assessment》2013,18(2):127-143
This study investigated factors related to score differences on computerized and paper-and-pencil versions of a series of primary K–3 reading tests. Factors studied included item and student characteristics. The results suggest that the score differences were more related to student than item characteristics. These student characteristics include response style variables, especially omitting, and socioeconomic status as measured by free lunch eligibility. In addition, response style and socioeconomic status appear to be relatively independent factors in the score differences. Variables studied but not found to be related to the format score differences included association of items with a reading passage, item difficulty, and teacher versus computer administration of items. However, because this study is the 1st to study the factors behind these score differences below Grade 3, and because a number of states are increasing computer testing at the primary grades, additional studies are needed to verify the importance of these 2 factors.  相似文献   

2.
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats—the figural response (FR) and constructed response (CR) formats used in a K–12 computerized science test. The item response theory (IRT) information function and confirmatory factor analysis (CFA) were employed to address the research questions. It was found that the FR items were similar to the multiple-choice (MC) items in providing information and efficiency, whereas the CR items provided noticeably more information than the MC items but tended to provide less information per minute. The CFA suggested that the innovative formats and the MC format measure similar constructs. Innovations in computerized item formats are reviewed, and the merits as well as challenges of implementing the innovative formats are discussed.  相似文献   

3.
《教育实用测度》2013,26(3):257-275
The purpose of this study was to investigate the technical properties of stem-equivalent mathematics items differing only with respect to response format. Using socio- economic factors to define the strata, a proportional stratified random sample of 1,366 Connecticut sixth-grade students were administered one of three forms. Classical item analysis, dimensionality assessment, item response theory goodness-of-fit, and an item bias analysis were conducted. Analysis of variance and confirmatory factor analysis were used to examine the functioning of the items presented in the three different formats. It was found that, after equating forms, the constructed-response formats were somewhat more difficult than the multiple-choice format. However, there was no significant difference across formats with respect to item discrimination. A differential item functioning (DIF) analysis was conducted using both the Mantel-Haenszel procedure and the comparison of the item characteristic curves. The DIF analysis indicated that the presence of bias was not greatly affected by item format; that is, items biased in one format tended to be biased in a similar manner when presented in a different format, and unbiased items tended to remain so regardless of format.  相似文献   

4.
Linguistic complexity of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a "linguistic complexity effect," and areas for further research in test item development and deaf students.  相似文献   

5.
The purpose of this study was to further examine the factor structure of the Huber Inventory of Trainee Self‐Efficacy (HITS), a measure of school psychology trainee self‐efficacy. Lockwood et al. (2017, Psychol. Sch., Vol. 54, pp. 655–670) extant data set, collected from 520 school psychology trainees, was utilized. Four measurement models were examined for model fit and factor loadings. Of the four models, a bifactor model with a single latent general self‐efficacy (GSE) and latent domain‐specific factors (i.e., Multidimensional Assessment Skills, Counseling Skills, Professional Interpersonal Skills, and Research Skills) was the most parsimonious. However, standardized loadings indicated that all practice‐related items loaded more significantly onto GSE than their domain‐specific factors, indicating the utility of GSE for practice‐related skills. Of note, the Research Skills factor displayed greater domain‐specific loadings than general loadings. These findings suggest that GSE may be the best indicator of trainee self‐efficacy, though a two‐factor model that represents practical skills versus research skills may also be appropriate. Additionally, reliability scores indicate that subscale interpretation may also be reasonable. Limitations, implications for trainers of school psychologists, and for future research directions are discussed.  相似文献   

6.
Item stem formats can alter the cognitive complexity as well as the type of abilities required for solving mathematics items. Consequently, it is possible that item stem formats can affect the dimensional structure of mathematics assessments. This empirical study investigated the relationship between item stem format and the dimensionality of mathematics assessments. A sample of 671 sixth-grade students was given two forms of a mathematics assessment in which mathematical expression (ME) items and word problems (WP) were used to measure the same content. The effects of mathematical language and reading abilities in responding to ME and WP items were explored using unidimensional and multidimensional item response theory models. The results showed that WP and ME items appear to differ with regard to the underlying abilities required to answer these items. Hence, the multidimensional model fit the response data better than the unidimensional model. For the accurate assessment of mathematics achievement, students’ reading and mathematical language abilities should also be considered when implementing mathematics assessments with ME and WP items.  相似文献   

7.
PISA测验着眼于学生的终生发展,其测验编制思想给各国教育评价带来了深刻的变革。本研究在PISA阅读测验理论与框架基础上,编制了PISA式汉语阅读测验。该测验包含三篇阅读材料,共18个测验项目。通过对测验难度、区分度、信度、效度的检测,并使用全息Bifactor模型进行维度评价。结果表明,编制的PISA式汉语阅读测验难度适中,具有较好区分度,信效度基本合格。同时,基本达到PISA对阅读测验能力结构的要求,较好地考查了学生的一般阅读理解能力,以及信息提取、文本解释、反思和评价等三个子维度的能力。  相似文献   

8.
Studies that have investigated differences in examinee performance on items administered in paper-and-pencil form or on a computer screen have produced equivocal results. Certain item administration procedures were hypothesized to be among the most important variables causing differences in item performance and ultimately in test scores obtained from these different administration media. A study where these item administration procedures were made as identical as possible for each presentation medium is described. In addition, a methodology is presented for studying the difficulty and discrimination of items under each presentation medium as a post hoc procedure.  相似文献   

9.
This empirical study was designed to determine the impact of computerized adap- tive test (CAT) administration formats on student performance. Students in medical technology programs took a paper-and-pencil and an individualized, computerized adaptive test. Students were randomly assigned to adaptive test administration for- mats to ascertain the effect on student performance of altering: (a) the difficulty of the first item, (b) the targeted level of test difficulty, (c) minimum test length, and (d) the opportunity to control the test. Computerized adaptive test data were analyzed with ANCO VA. The paper-and.pencil test was used as a covariate to equalize abil- ity variance among cells. The only significant main effect was for opportunity to control the test. There were no significant interactions among test administration formats. This study provides evidence concerning adjusting traditional computer- ized adaptive testing to more familiar testing modalities.  相似文献   

10.
The psychometric literature provides little empirical evaluation of examinee test data to assess essential psychometric properties of innovative items. In this study, examinee responses to conventional (e.g., multiple choice) and innovative item formats in a computer-based testing program were analyzed for IRT information with the three-parameter and graded response models. The innovative item types considered in this study provided more information across all levels of ability than multiple-choice items. In addition, accurate timing data captured via computer administration were analyzed to consider the relative efficiency of the multiple choice and innovative item types. As with previous research, multiple-choice items provide more information per unit time. Implications for balancing policy, psychometric, and pragmatic factors in selecting item formats are also discussed.  相似文献   

11.
Although the recent identification of the five critical components of early literacy has been a catalyst for modifications to the content of materials used to provide reading instruction and the tools used to examine student’s acquisition of early literacy skills, these skills have not received equal attention from test developers and publishers. In particular, a review of early literacy available measures for screening and monitoring students reveals a dearth of tools for examining different facets of reading comprehension. The purposes of this study were twofold: (a) to examine the relative difficulty of items written to assess literal, inferential, and evaluative comprehension, and (b) to compare single factor and bifactor models of reading comprehension to determine if items written to assess students’ literal, inferential, and evaluative comprehension abilities comprise unique measurement factors. Data from approximately 2,400 fifth grade students collected in the fall, winter, and spring of fifth grader were used to examine these questions. Findings indicated that (a) the relative difficulty of item types may be curvilinear, with literal items being significantly less challenging than inferential and evaluative items, and (b) literal, inferential, and evaluative comprehension measurement factors explained unique portions of variance in addition to a general reading comprehension factor. Instructional implications of the findings are discussed.  相似文献   

12.
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and classification consistency and accuracy under three item response theory (IRT) frameworks: unidimensional IRT (UIRT), simple structure multidimensional IRT (SS-MIRT), and bifactor multidimensional IRT (BF-MIRT) models. Illustrative examples are presented using data from three mixed-format exams with various levels of format effects. In general, the two MIRT models produced similar results, while the UIRT model resulted in consistently lower estimates of reliability and classification consistency/accuracy indices compared to the MIRT models.  相似文献   

13.
For many years, reading comprehension in the Programme for International Student Assessment (PISA) was measured via paper‐based assessment (PBA). In the 2015 cycle, computer‐based assessment (CBA) was introduced, raising the question of whether central equivalence criteria required for a valid interpretation of the results are fulfilled. As an extension of the PISA 2012 main study in Germany, a random subsample of two intact PISA reading clusters, either computerized or paper‐based, was assessed using a random group design with an additional within‐subject variation. The results are in line with the hypothesis of construct equivalence. That is, the latent cross‐mode correlation of PISA reading comprehension was not significantly different from the expected correlation between the two clusters. Significant mode effects on item difficulties were observed for a small number of items only. Interindividual differences found in mode effects were negatively correlated with reading comprehension, but were not predicted by basic computer skills or gender. Further differences between modes were found with respect to the number of missing values.  相似文献   

14.
As access and reliance on technology continue to increase, so does the use of computerized testing for admissions, licensure/certification, and accountability exams. Nonetheless, full computer‐based test (CBT) implementation can be difficult due to limited resources. As a result, some testing programs offer both CBT and paper‐based test (PBT) administration formats. In such situations, evidence that scores obtained from different formats are comparable must be gathered. In this study, we illustrate how contemporary statistical methods can be used to provide evidence regarding the comparability of CBT and PBT scores at the total test score and item levels. Specifically, we looked at the invariance of test structure and item functioning across test administration mode across subgroups of students defined by SES and sex. Multiple replications of both confirmatory factor analysis and Rasch differential item functioning analyses were used to assess invariance at the factorial and item levels. Results revealed a unidimensional construct with moderate statistical support for strong factorial‐level invariance across SES subgroups, and moderate support of invariance across sex. Issues involved in applying these analyses to future evaluations of the comparability of scores from different versions of a test are discussed.  相似文献   

15.
The presence of nuisance dimensionality is a potential threat to the accuracy of results for tests calibrated using a measurement model such as a factor analytic model or an item response theory model. This article describes a mixture group bifactor model to account for the nuisance dimensionality due to a testlet structure as well as the dimensionality due to differences in patterns of responses. The model can be used for testing whether or not an item functions differently across latent groups in addition to investigating the differential effect of local dependency among items within a testlet. An example is presented comparing test speededness results from a conventional factor mixture model, which ignores the testlet structure, with results from the mixture group bifactor model. Results suggested the 2 models treated the data somewhat differently. Analysis of the item response patterns indicated that the 2-class mixture bifactor model tended to categorize omissions as indicating speededness. With the mixture group bifactor model, more local dependency was present in the speeded than in the nonspeeded class. Evidence from a simulation study indicated the Bayesian estimation method used in this study for the mixture group bifactor model can successfully recover generated model parameters for 1- to 3-group models for tests containing testlets.  相似文献   

16.
The reading data from the 1983–84 National Assessment of Educational Progress survey were scaled using a unidimensional item response theory model. To determine whether the responses to the reading items were consistent with unidimensionality, the full-information factor analysis method developed by Bock and associates (1985) and Rosenbaum's (1984) test of unidimensionality, conditional (local) independence, and monotonicity were applied. Full-information factor analysis involves the assumption of a particular item response function; the number of latent variables required to obtain a reasonable fit to the data is then determined. The Rosenbaum method provides a test of the more general hypothesis that the data can be represented by a model characterized by unidimensionality, conditional independence, and monotonicity. Results of both methods indicated that the reading items could be regarded as measures of a single dimension. Simulation studies were conducted to investigate the impact of balanced incomplete block (BIB) spiraling, used in NAEP to assign items to students, on methods of dimensionality assessment. In general, conclusions about dimensionality were the same for BIB-spiraled data as for complete data.  相似文献   

17.
A thorough search of the literature was conducted to locate empirical studies investigating the trait or construct equivalence of multiple-choice (MC) and constructed-response (CR) items. Of the 67 studies identified, 29 studies included 56 correlations between items in both formats. These 56 correlations were corrected for attenuation and synthesized to establish evidence for a common estimate of correlation (true-score correlations). The 56 disattenuated correlations were highly heterogeneous. A search for moderators to explain this variation uncovered the role of the design characteristics of test items used in the studies. When items are constructed in both formats using the same stem (stem equivalent), the mean correlation between the two formats approaches unity and is significantly higher than when using non-stem-equivalent items (particularly when using essay-type items). Construct equivalence, in part, appears to be a function of the item design method or the item writer's intent.  相似文献   

18.
Language comprehension is crucial to reading. However, theoretical models and recent research raise questions about what constitutes this multifaceted domain. We present two related studies examining the dimensionality of language comprehension and relations to reading comprehension in the upper elementary grades. Studies 1 (Grade 6; N = 148) and 2 (Grade 3–5; = 311) contrasted factor models of language comprehension using item level indicators of morphological awareness and vocabulary (Studies 1 and 2) and syntactic awareness (Study 2). In both studies, a bifactor model—including general language comprehension and specific factors for each language component—best fit the data, and general language comprehension was the strongest predictor of reading comprehension. In Study 2, the morphology-specific factor also uniquely predicted reading comprehension above and beyond general language comprehension. Results suggest the value of modeling the common proficiency underlying performance on tasks designed to tap theoretically distinct language comprehension skills.  相似文献   

19.
The Schmid‐Leiman decomposition of a hierarchical factor model converts the model to a constrained case of a bifactor model with orthogonal common factors that is equivalent to the hierarchical model. This article discusses the equivalence and near‐equivalence of the hierarchical and bifactor models and the implications of the difficulty of distinguishing between these models because of low power in samples commonly found in academic research.  相似文献   

20.
This study investigated whether scores obtained from the online and paper-and-pencil administrations of the statewide end-of-course English test were equivalent for students with and without disabilities. Score comparability was evaluated by examining equivalence of factor structure (measurement invariance) and differential item and bundle functioning analyses for the online and paper groups. Results supported measurement invariance between the online and paper groups, suggesting that it is meaningful to compare scores across administration modes. When the data were analyzed at both the item and item bundle (content area) levels, similar performance appeared between the online and paper groups.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号