首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
In this research, the author addresses whether the application of unidimensional item response models provides valid interpretation of test results when administering items sensitive to multiple latent dimensions. Overall, the present study found that unidimensional models are quite robust to the violation of the unidimensionality assumption due to secondary dimensions from sensitive items. When secondary dimensions are highly correlated with main construct, unidimensional models generally fit and the accuracy of ability estimation is comparable to that of strictly unidimensional tests. In addition, longer tests are more robust to the violation of the essential unidimensionality assumption than shorter ones. The author also shows that unidimensional item response theory models estimate item difficulty parameter better than item discrimination parameter in tests with secondary dimensions.  相似文献   

2.
We compare the accuracy of confidence intervals (CIs) and tests of close fit based on the root mean square error of approximation (RMSEA) with those based on the standardized root mean square residual (SRMR). Investigations used normal and nonnormal data with models ranging from p = 10 to 60 observed variables. CIs and tests of close fit based on the SRMR are generally accurate across all conditions (even at p = 60 with nonnormal data). In contrast, CIs and tests of close fit based on the RMSEA are only accurate in small models. In larger models (p ≥ 30), they incorrectly suggest that models do not fit closely, particularly if sample size is less than 500.  相似文献   

3.
Two Lagrange multiplier (LM) methods may be used in specification searches for adding parameters to models: one based on univariate LM tests and respecification of the model (LM‐respecified method) and the other based on a partitioning of multivariate LM tests (LM‐incremental method). These methods may result in extraneous parameters being included in models due to either sampling error or the model being misspecified. A 2‐stage specification search may be used to reduce errors due to misspecification. In the 1st stage, parameters are added to models based on LM tests to maximize fit. Second, parameters added in the 1st stage are deleted if they are no longer necessary to maintain model fit. Illustrations are presented to demonstrate that errors due to misspecification occur with the LM‐respecified method and are even more likely with the LM‐incremental approach. These illustrations also show how the deletion stage can help eliminate some of these errors.  相似文献   

4.
An Extension of Four IRT Linking Methods for Mixed-Format Tests   总被引:1,自引:0,他引:1  
Under item response theory (IRT), linking proficiency scales from separate calibrations of multiple forms of a test to achieve a common scale is required in many applications. Four IRT linking methods including the mean/mean, mean/sigma, Haebara, and Stocking-Lord methods have been presented for use with single-format tests. This study extends the four linking methods to a mixture of unidimensional IRT models for mixed-format tests. Each linking method extended is intended to handle mixed-format tests using any mixture of the following five IRT models: the three-parameter logistic, graded response, generalized partial credit, nominal response (NR), and multiple-choice (MC) models. A simulation study is conducted to investigate the performance of the four linking methods extended to mixed-format tests. Overall, the Haebara and Stocking-Lord methods yield more accurate linking results than the mean/mean and mean/sigma methods. When the NR model or the MC model is used to analyze data from mixed-format tests, limitations of the mean/mean, mean/sigma, and Stocking-Lord methods are described.  相似文献   

5.
在分析研究近年大量语文试题的基础上归纳出语文考试的大致趋势;主观性试题的比重越来越大,客观性试题大大减少;开放性试题逐渐增加,重视考查学生的思维能力,尤其是创造性思维能力,提倡探究,鼓励创造;课内外结合,重视语言积累和文化积累;注重考查综合知识和综合能力,注重学科之间的综合;注意从情感态度与价值观、过程与方法、知识和能力三个维度设计考试题,试卷日益体现对学生的人文关怀;听说读写全面考查;作文测试紧扣时代脉搏,直面现实生活,力求引导学生关注现实人生,写出真情实感;立足生活实际,关注社会热点,强调学以致用;试题设计越来越新颖,考试方式呈现多样化。  相似文献   

6.
C‐tests are a specific variant of cloze tests that are considered time‐efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C‐tests and compared the changes in item difficulties, reliability estimates, and person parameter estimates for different modeling approaches: (a) Rasch, (b) testlet, (c) partial credit, and (d) copula models. The results are complemented with findings of a simulation study in which sample size, number of testlets, and strength of residual correlations between items were systematically manipulated. Results are discussed with regard to the pivotal question whether residual dependencies between items are an artifact or part of the construct.  相似文献   

7.
Standard 3.9 of the Standards for Educational and Psychological Testing ( 1999 ) demands evidence of model fit when item response theory (IRT) models are employed to data from tests. Hambleton and Han ( 2005 ) and Sinharay ( 2005 ) recommended the assessment of practical significance of misfit of IRT models, but few examples of such assessment can be found in the literature concerning IRT model fit. In this article, practical significance of misfit of IRT models was assessed using data from several tests that employ IRT models to report scores. The IRT model did not fit any data set considered in this article. However, the extent of practical significance of misfit varied over the data sets.  相似文献   

8.
This paper presents a critical review of literature investigating assessment of mathematical modelling. Written tests, projects, hands-on tests, portfolio and contests are modes of modelling assessment identified in this study. The written tests found in the reviewed papers draw on an atomistic view on modelling competencies, whereas projects are described to assess a more holistic modelling competence but obstacles regarding reliability of assessing projects are identified. The outcome of this investigation also indicates that the criteria used in frameworks or modes of assessment seldom are derived from a theoretical analysis, but more often based on ad hoc constructions, experience from assessment situations or empirical studies of students’ work. Finally, this study suggests that an elaborated view on the meaning of quality of mathematical models is needed in order to assess the quality of students’ work with mathematical models.  相似文献   

9.
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and classification consistency and accuracy under three item response theory (IRT) frameworks: unidimensional IRT (UIRT), simple structure multidimensional IRT (SS-MIRT), and bifactor multidimensional IRT (BF-MIRT) models. Illustrative examples are presented using data from three mixed-format exams with various levels of format effects. In general, the two MIRT models produced similar results, while the UIRT model resulted in consistently lower estimates of reliability and classification consistency/accuracy indices compared to the MIRT models.  相似文献   

10.
《教育实用测度》2013,26(4):393-406
Two models are presented in this article for estimating the proportion of students who would pass all of three or more content area tests given that none have actually been tested in more than two of the content areas. The first model allows one to estimate the proportion of students who would pass all of three or more content area tests from the test results of a study in which no student took more than two of the tests; the second model (which requires an outside estimate of the correlations between the different content area tests) allows one to estimate the proportion of students who would pass all of three or more content area tests from the test results of a study (or field test results) in which students took only one content area test. The models were tested on the Texas End-of-Course test battery (which consists of four content area tests) results of students who took all four content area tests prior to or in the spring of 2001, with at least one of the end-of-course content area tests taken in the spring of 2001. The model test results may have particular application to state assessment programs that must perform standard setting on high-stakes exams before the first live administration of the exams.  相似文献   

11.
Loglinear latent class models are used to detect differential item functioning (DIF). These models are formulated in such a manner that the attribute to be assessed may be continuous, as in a Rasch model, or categorical, as in Latent Class Mastery models. Further, an item may exhibit DIF with respect to a manifest grouping variable, a latent grouping variable, or both. Likelihood-ratio tests for assessing the presence of various types of DIF are described, and these methods are illustrated through the analysis of a "real world" data set.  相似文献   

12.
The traditional laboratory models for the hydroelasticity and seakeeping performance of ships are tested in calm water and in uni-directional, artificially generated waves. A new alternative to the tank model measurement methodology is to conduct experiments using large-scale models in actual sea conditions. To implement the tests, a large-scale segmented self-propelling model and testing system were designed and assembled. A buoy wave meter was adopted to record the coastal waves that the model encountered during the tests. The analysis of the results of waves in sheltered waters by the spectral method shows good agreement with ISSC spectra. To investigate the difference between this new methodology and the traditional towing tank tests, a small-scale model, whose type and configuration are the same as those of the large-scale model ship, was used and tests were conducted in a towing tank. Comparison of the two experimental results shows that there is a remarkable difference in the response characteristics between the large-scale model at sea and the small-scale model in the tank. Numerical simulations of the responses of the ship under equivalent sea states were also carried out. The influence of directional spreading functions on the results was analyzed by a numerical approach. The classical model tests under long-crested waves in the towing tank over-estimate the motion and wave load responses; however, large-scale model tests carried out at sea are more reasonable for ship design and scientific research.  相似文献   

13.
The purpose of this study is to identify perceptual differences between hierarchical levels in organizations in general and in university departments in particular, and to analyze their consequences on the relationships between the need for change, the implementation of change, and the assessment of the success of change.Three different models are developed and tested. The first model examines the amount of change in the various aspects of change at different types of departments. The second model examines the factor structure of the various actors in the system. The third model tests separately for each perceiver the magnitude of relationship between the different aspects of change and the success of change.The implications of the models and their empirical tests to future studies of organizational change are discussed and elaborated.  相似文献   

14.
《教育实用测度》2013,26(3):223-231
Emerging areas of research, related to computer-based testing, are identified. Computer-based adaptive testing (CAT) poses many problems, including calibrating adaptive tests with their conventional counterparts, content-balancing in item selection, and accommodating multidimensional items. Using the computer to administer tests provides freedom to use many new item types, including tests of short-term memory, spatial memory, perceptual speed and accuracy, and movement judgment. Information processing theory offers a new way of conceptualizing abilities that is not easily reconciled with traditional measurement models.  相似文献   

15.
Structured means analysis is a very useful approach for testing hypotheses about population means on latent constructs. In such models, a z test is most commonly used for testing the statistical significance of the relevant parameter estimates or of the differences between parameter estimates, where a z value is computed based on the asymptotic standard error estimate associated with the parameter of interest. In the current article, a series of population analyses demonstrate that the z tests for latent mean structure parameters or, more directly, the standard error estimates upon which those z tests are based are, not invariant to how factors are scaled. As such, circumstances exist in which latent mean inference is compromised solely as a result of scaling decisions. This problem is illustrated in the context of between-subjects (i.e., multisample) latent means models and within-subjects latent means models. Recommendations for practice are also offered.  相似文献   

16.
The purpose of this paper is to define and evaluate the categories of cognitive models underlying at least three types of educational tests. We argue that while all educational tests may be based—explicitly or implicitly—on a cognitive model, the categories of cognitive models underlying tests often range in their development and in the psychological evidence gathered to support their value. For researchers and practitioners, awareness of different cognitive models may facilitate the evaluation of educational measures for the purpose of generating diagnostic inferences, especially about examinees' thinking processes, including misconceptions, strengths, and/or abilities. We think a discussion of the types of cognitive models underlying educational measures is useful not only for taxonomic ends, but also for becoming increasingly aware of evidentiary claims in educational assessment and for promoting the explicit identification of cognitive models in test development. We begin our discussion by defining the term cognitive model in educational measurement. Next, we review and evaluate three categories of cognitive models that have been identified for educational testing purposes using examples from the literature. Finally, we highlight the practical implications of "blending" models for the purpose of improving educational measures .  相似文献   

17.
《教育实用测度》2013,26(1):99-107
Teacher testing has become well established as a tool of educational policy. Nearly all of the states now require passage of tests for teacher licensure, and tests also figure prominently in strategies to reform the teaching profession. Several models have been developed for assessing the teaching function, including objective paper-and-pencil tests and structured classroom observations. Both are used for state licensure, although evidence of their criterion-related validity is virtually nonexistent. New forms of performance assessments are now being developed, which should reflect the knowledge and skills of teaching more faithfully than any existing instruments. It is hoped that these tests will not only provide fairer and more accurate measurements of teaching ability, but will also stimulate changes in teacher preparation programs, and ultimately, improve classroom practice.  相似文献   

18.
This study tested four theoretical models in terms of their fit with demands placed on our cognitive system by traditional tests of cognitive ability. We did so by administering seven tests of cognitive ability known to require varying types of processing demands to a large group of college undergraduates (N = 193). We compared the models using confirmatory factor analyses, including those based upon a unitary factor, speed and capacity, crystallized and fluid intelligence, and verbal and spatial ability. The crystallized/fluid model provided the best fit with the data. This finding is consistent with previous research. Implications for education and future research are discussed.  相似文献   

19.
The current widespread availability of software packages with estimation features for testing structural equation models with binary indicators makes it possible to investigate many hypotheses about differences in proportions over time that are typically only tested with conventional categorical data analyses for matched pairs or repeated measures, such as McNemar’s chi-square. The connection between these conventional tests and simple longitudinal structural equation models is described. The equivalence of several conventional analyses and structural equation models reveals some foundational concepts underlying common longitudinal modeling strategies and brings to light a number of possible modeling extensions that will allow investigators to pursue more complex research questions involving multiple repeated proportion contrasts, mixed between-subjects × within-subjects interactions, and comparisons of estimated membership proportions using latent class factors with multiple indicators. Several models are illustrated, and the implications for using structural equation models for comparing binary repeated measures or matched pairs are discussed.  相似文献   

20.
本文提出了采用精确的负荷和同步电机模型的仿真技术来进行动态和暂态电压特性研究的方法.文中着重介绍了感应电动机模型.导出了电压支持设备包括OLTC变压器的精确模型.采用梯形法则将基于上述模型的系统微分方程转化为线性代数方程.然后将这些方程差分化以利于程序的实现.对于各种预想事故,利用了一种数字迭代技术来对系统电压和功角特性进行仿真.分别在9节点和22节点系统上对各种预想事故进行了测试,结果证明该软件包的实用性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号