首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
“测验连接”概念框架演变述评   总被引:1,自引:0,他引:1  
程乾 《考试研究》2013,(2):71-79
测验连接是心理与教育测量研究中一个重要的领域,是通过统计方法将一个测验的分数以另一个测验的分数单位表示,或者将两个测验的分数表示在共同的分数量尺上。虽然测验连接有较长的研究历史,但是不同学者对其有不同分类。其中有些分类术语别无二致,但其定义却大相径庭,这使研究者和实践者产生了极大混乱。鉴于此,有必要从历史的角度梳理连接的概念框架及其变化,以便更好地理解和应用测验连接。  相似文献   

2.
How does the fact that two tests should not be equated manifest itself? This paper addresses this question through the study of the degree to which equating functions fail to exhibit population invariance across subpopulations. Equating fimctions are supposed to be population invariant by definition. But, when two tests are not equatable, it is possible that the linking functions, used to connect the scores of one to the scores of the other, are not invariant across different populations of examinees. While no acceptable equating function is ever completely population invariant, in the situations where equating is usually performed we believe that the dependence of the equating function on the population used to compute it is usually small enough to be ignored. We introduce two root‐mean‐square difference measures of the degree to which the functions used to link two tests computed on different subpopulations differ from the linking function computed for the whole population. We also introduce the system of “parallel‐linear” linking functions for multiple subpopulations and show that, for this system, our measure of population invariance can be computed easily from the standardized mean differences between the scores of the subpopulations on the two tests. For the parallel‐linear case, we develop a correlation‐based upper bound on our measure that holds for all systems of subpopulations. We illustrate these ideas using data from the SAT I and from a concordance study of several combinations of ACT and SAT I scores, In the appendices, we give some theoretical results bearing on the other equating “requirements” of “same construct,”“same reliability” and one aspect of Lord's concept of equity.  相似文献   

3.
What problems arise in translating a test to other languages? How can performance be compared for students who take different language versions of a test? What designs can be used for linking studies?  相似文献   

4.
There is a tendency in the literature to characterize linking as equating done somewhat less rigorously. The ambiguity of this conception can lead to confusion amongst policy‐makers and members of the public and can result in the proliferation of comparability myths. As the constructs assessed by two tests decrease in similarity, so the difference between equating and linking becomes one of kind rather than degree. To help make sense of linking in different contexts, a general model is proposed, based upon the idea of a ‘linking construct’. This general model is used to define the limits of linking and to clarify what users and stakeholders need to know about linking and linked scores. Finally, a distinction is drawn between judgemental linking as a method (e.g., social moderation) and judgemental linking as a theory (i.e., the value judgement theory of linking). The latter presents a challenge to the general model, which is defended.  相似文献   

5.
An Extension of Four IRT Linking Methods for Mixed-Format Tests   总被引:1,自引:0,他引:1  
Under item response theory (IRT), linking proficiency scales from separate calibrations of multiple forms of a test to achieve a common scale is required in many applications. Four IRT linking methods including the mean/mean, mean/sigma, Haebara, and Stocking-Lord methods have been presented for use with single-format tests. This study extends the four linking methods to a mixture of unidimensional IRT models for mixed-format tests. Each linking method extended is intended to handle mixed-format tests using any mixture of the following five IRT models: the three-parameter logistic, graded response, generalized partial credit, nominal response (NR), and multiple-choice (MC) models. A simulation study is conducted to investigate the performance of the four linking methods extended to mixed-format tests. Overall, the Haebara and Stocking-Lord methods yield more accurate linking results than the mean/mean and mean/sigma methods. When the NR model or the MC model is used to analyze data from mixed-format tests, limitations of the mean/mean, mean/sigma, and Stocking-Lord methods are described.  相似文献   

6.
Published discussions of the year-to-year linking of tests comprised of polytomous items appear to suggest that the linking logic traditionally used for multiple-choice items is also appropriate for polytomous items. It is argued and illustrated that a modification of the traditional linking is necessary when tests consist of constructed-response items judged by raters and there is a possibility of year-to-year variation in the rating discrimination and severity.  相似文献   

7.
文章从创建题库、库的连接、随机出题、选项随机排序、答案自动跟随、主窗口交互、倒计时显示和时间控制、自动评分等几个方面,利用A uthorw erScript开发智能化单选题自测系统(该系统可以作为各类智能化自测、考试系统中的单选题模块,也可以自成系统)进行了一些探讨。  相似文献   

8.
The aim of this study is to link the science scale of the German National Educational Panel Study (NEPS) with the science scale of the Programme for International Student Assessment (PISA). One requirement for a strong linking of test scores from different studies is a sufficient similarity of the tests regarding their constructs. The present study aims to assess the similarity of the operationalized constructs of the NEPS and PISA scientific literacy tests with the aim to link the scales of the two tests. A linking study was carried out for this purpose in which 1079 students worked on the tasks of both studies. The results of the comparison between NEPS and PISA indicated a high overlap regarding their constructs. However, both studies deal with missing responses differently. The linking via equipercentile equating showed a high classification consistency which was highest when missing responses were ignored in both studies.  相似文献   

9.
How does a testing component function in an integrated learning system? How can you customize computerized tests to meet local specifications? How are computerized tests implemented and evaluated? Is the pay-off of computerized testing justified?  相似文献   

10.
The error associated with a proposed linking method for tests consisting of both constructed response and multiple choice items was investigated in a simulation study. Study factors that were varied included the relative proportion of constructed response items in the test, the size of the year-to-year change in the ability metric, the number of anchor items, the number of linking papers to be reassessed, and the presence of guessing. The results supported the use of the proposed linking method, In addition, simulations were used to illustrate possible linking bias resulting from (a) the use of the traditional linking method and (b) the use of only multiple choice anchor items in the presence of test multidimensionality.  相似文献   

11.
A goal for any linking or equating of two or more tests is that the linking function be invariant to the population used in conducting the linking or equating. Violations of population invariance in linking and equating jeopardize the fairness and validity of test scores, and pose particular problems for test‐based accountability programs that require schools, districts, and states to report annual progress on academic indicators disaggregated by demographic group membership. This instructional module provides a comprehensive overview of population invariance in linking and equating and the relevant methodology developed for evaluating violations of invariance. A numeric example is used to illustrate the comparative properties of available methods, and important considerations for evaluating population invariance in linking and equating are presented.  相似文献   

12.
Many of us are frustrated with the overuse of intelligence tests. But intelligence tests have become so entrenched in our society that it is hard to imagine how they realistically could be replaced. Schools would be without a well-established screening device, and intelligence research would be without an external measure of validity. But what if we started over and imagined thinking about intelligence without the benefit (some would say hindrance) of Binet? What would theories and tests of intelligence look like? The articles in this special issue address this topic; here, I discuss the articles. The discussion is divided into three sections. The first section deals with definitional issues: How can intelligence be operationally defined, and can a single definition capture cognitive abilities of individuals at all ages? The second section briefly summarizes and evaluates each of the seven theories: How intelligent are these theories of intelligence? The final section focuses on the implications of the theories and theory-based tests reported in this issue: How can future research and educational practices benefit from the views presented here?  相似文献   

13.
The purpose of this study was to investigate whether simulated differential motivation between the stakes for operational tests and anchor items produces an invalid linking result if the Rasch model is used to link the operational tests. This was done for an external anchor design and a variation of a pretest design. The study also investigated whether a constrained mixture Rasch model could identify latent classes in such a way that one latent class represented high‐stakes responding while the other represented low‐stakes responding. The results indicated that for an external anchor design, the Rasch linking result was only biased when the motivation level differed between the subpopulations to which the anchor items were administered. However, the mixture Rasch model did not identify the classes representing low‐stakes and high‐stakes responding. When a pretest design was used to link the operational tests by means of a Rasch model, the linking result was found to be biased in each condition. Bias increased as percentage of students showing low‐stakes responding to the anchor items increased. The mixture Rasch model only identified the classes representing low‐stakes and high‐stakes responding under a limited number of conditions.  相似文献   

14.
Information Technology has the potential to provide virtually any educational requirement which the human mind can imagine. Before ideas can become applications, however, there have to be a suitable infrastructure and detailed procedures, otherwise they merely become historical footnotes. Inside Information, a scheme developed by the B.B.C. and the City and Guilds of London Institute, uses Information Technology in some novel ways and in some new contexts. The linking of it to a short-course programme sponsored by the Department of Education and Science opens up a much wider range of possibilities. The article explains the background to the development of Inside Information and its linking to the Department's short-course programme. It shows how such linking is necessary to enable the potential of the scheme to be realised. Inside Information, as it is now being developed, makes distinctions between formal, nonformal and informal education meaningless for most purposes. Medium- and long-term scenarios are suggested. Exciting as they are, these scenarios cannot become reality until the necessary infrastructure and procedures have been created.  相似文献   

15.
In many educational tests, both multiple‐choice (MC) and constructed‐response (CR) sections are used to measure different constructs. In many common cases, security concerns lead to the use of form‐specific CR items that cannot be used for equating test scores, along with MC sections that can be linked to previous test forms via common items. In such cases, adjustment by minimum discriminant information may be used to link CR section scores and composite scores based on both MC and CR sections. This approach is an innovative extension that addresses the long‐standing issue of linking CR test scores across test forms in the absence of common items in educational measurement. It is applied to a series of administrations from an international language assessment with MC sections for receptive skills and CR sections for productive skills. To assess the linking results, harmonic regression is applied to examine the effects of the proposed linking method on score stability, among several analyses for evaluation.  相似文献   

16.
The Commission on the Future of Higher Education in the USA emphasises accountability in higher education as one of its key areas of interest. A programme, called the Voluntary System of Accountability, was developed to evaluate the effectiveness of general public college education. This study examines how students progress in college, indicated by the performance difference between freshmen and seniors after controlling for admission scores, can be measured using the Measure of Academic Proficiency and Progress? (MAPP?) test. A total of 6196 students from 23 institutions was included in this study. Results indicated that MAPP was able to differentiate between the performance of freshmen and seniors after controlling for SAT®/ACT scores. The institutions were classified into 10 groups on the basis of the difference in the actual versus expected MAPP performance. The assumptions and implications of linking student learning growth to institutional effectiveness are discussed. Methodological issues on value‐added calculation and student motivation in taking standardised tests are also noted.  相似文献   

17.
Numerous assessments contain a mixture of multiple choice (MC) and constructed response (CR) item types and many have been found to measure more than one trait. Thus, there is a need for multidimensional dichotomous and polytomous item response theory (IRT) modeling solutions, including multidimensional linking software. For example, multidimensional item response theory (MIRT) may have a promising future in subscale score proficiency estimation, leading toward a more diagnostic orientation, which requires the linking of these subscale scores across different forms and populations. Several multidimensional linking studies can be found in the literature; however, none have used a combination of MC and CR item types. Thus, this research explores multidimensional linking accuracy for tests composed of both MC and CR items using a matching test characteristic/response function approach. The two-dimensional simulation study presented here used real data-derived parameters from a large-scale statewide assessment with two subscale scores for diagnostic profiling purposes, under varying conditions of anchor set lengths (6, 8, 16, 32, 60), across 10 population distributions, with a mixture of simple versus complex structured items, using a sample size of 3,000. It was found that for a well chosen anchor set, the parameters recovered well after equating across all populations, even for anchor sets composed of as few as six items.  相似文献   

18.
Where were you in 1963, when this article was first published? What are the two kinds of information you can get from achievement tests? What is a‘triterion level?”  相似文献   

19.
马克思主义中国化研究的基本问题是什么?这些基本问题之间的关系如何?未来马克思主义中国化研究的新的视野或者说新的着力点在哪里?理论联系实际的具体方法是什么?马克思主义中国化理论创新的动力在哪里?中国特色社会主义道路和发展模式的创新性及其国际意义表现在哪里?就以上问题采访了著名马克思主义研究学者辛向阳,他的回答反映了一位马克思主义研究专家的真知灼见。  相似文献   

20.
This article focuses on how program design affects program performance, linking participant motivation to performance. The focus of the article is a study that took place in a nonprofit organization and addressed how to engage volunteers such that they find meaning in the work they do and satisfy the needs of those they serve. Moreover, the findings suggest causal relationships among the variables of program design, leadership, and participant motivation. The article highlights some implications of human motivation on performance improvement. The following questions are addressed: How does a leader evaluate program performance and participant motivation? How does a leader redesign a program so that it maximizes participant performance and elevates intrinsic motivation? It is noted that motivation translates into energy. Energy is what one expends to accomplish a task. That task, once accomplished, can be measured against an expectation. Did what was produced meet or exceed expectations?  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号