首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In response to an increasingly "test driven" approach to curriculum and instruction, a growing body of research is documenting the negative effects of standardized testing on instruction, pupils, and teachers. The curriculum has been narrowed to fit the tests, valuable instructional time is spent on test preparation, students are stressed by the preparation and the test-taking, and teachers are pressured to use unacceptable testing procedures which may even include blatant cheating. Students are subjected to instructional methods that provide them with test taking skills but no genuine understanding of the subject matter. Research supports that there is an urgent need for a change in testing policies. Massive standardized testing of young children must be eliminated. More authentic and performance based evaluation such as the development of student portfolios should be instituted. Attention to the negative effects of standardized testing and the elimination of such testing of young children is especially urgent in light of the current discussions centering on increasing the scope of national testing beginning with the administration of a school readiness instrument prior to school entry.  相似文献   

2.
In response to an increasingly "test driven" approach to curriculum and instruction, a growing body of research is documenting the negative effects of standardized testing on instruction, pupils, and teachers. The curriculum has been narrowed to fit the tests, valuable instructional time is spent on test preparation, students are stressed by the preparation and the test-taking, and teachers are pressured to use unacceptable testing procedures which may even include blatant cheating. Students are subjected to instructional methods that provide them with test taking skills but no genuine understanding of the subject matter. Research supports that there is an urgent need for a change in testing policies. Massive standardized testing of young children must be eliminated. More authentic and performance based evaluation such as the development of student portfolios should be instituted. Attention to the negative effects of standardized testing and the elimination of such testing of young children is especially urgent in light of the current discussions centering on increasing the scope of national testing beginning with the administration of a school readiness instrument prior to school entry.  相似文献   

3.

Too often tests are used with clients for whom the validity of the test has not been established. As a case in point we studied the use of the Human Figure Drawing (HFD) test with children living in Curaçao, a small island in the Caribbean. In this community no time and money are available for developing tests and establishing their validity and norms. We suggest that borrowing such information can be a relatively good, inexpensive alternative, provided that clinicians make the best of choices. This paper formulates three requirements, which should be met by the group of clients a clinician is working with. As an example we explored to what extent the requirements are being satisfied by 96 Curaçaoan Grade 4 school children. With regard to these children we conclude that clinicians using the HFD test can best use US representative frequency tables for scoring.  相似文献   

4.
It is suggested that for criterion-referenced tests to have any educational value, they must be linked to the categories of learning that have been demonstrated in learning theory. These categories form the basis of the test domains. The nature of the two main categories, concepts and rules, is reviewed and it is suggested that the errors produced by pupils that indicate faulty concept learning or rules application should form the basis for the production of tests. Examples of such tests are also discussed. As this approach to testing is markedly different to the current psychometric approach to criterion-referenced testing it is suggested that the form of testing described here be calledconcept-referenced testing to distinguish it from other forms of criterion-referenced measures.  相似文献   

5.
The advantages of a rule assessment approach to the interpretation of achievement test results have been demonstrated using an S-P chart with coded error types. The problems of similar total test scores resulting from completely different misapprehensions, as well as correct answers resulting from incorrect rules of operation, were addressed using a simulated data-set.Although the overall quality of the test used here as measured by conventional psychometric indices proved satisfactory, it was shown that the traditional interpretation, which refers to total test scores, can be misleading, especially when adaptive remediation is sought. It is well known in medical sciences that a disease has several symptoms yet several diseases can share the same symptoms (i.e. high fever). Consequently, no responsible physician would prescribe the same medicine for two patients suffering from different diseases just because they both share high fever as one of their symptoms. Similarly, when two students with different misapprehensions get the same total test score, should the teacher prescribe the same remediation for correcting their misapprehension?Although the method for diagnostic test construction was out of the scope of this paper, it should be noted that test design is a crucial matter which eventually determines the quality of the diagnosis. One has to, therefore, carefully choose the items for the diagnosis in order to maximize the information about the rules of operation underlying the students' responses. A task specification chart (Birenbaum & Shaw, 1985) may serve as a useful tool in the process of test construction. As was illustrated in the chart, when an item yields the same results as a result of various “bugs”, its contribution to rule assessment is in question.Although in reality test results are contaminated by noise resulting from careless errors or strategy changes during the test, the overall identification rate achieved by diagnostic tests ranges between 70%–80% (Tatsuoka, 1984). Similarly, current AI diagnostic systems such as DEBUGGY and DPF are reported as being capable of identifying 80%–90% of student errors (VanLehn, 1981; Ohlesson & Langley, 1985). It seems that such a rate justifies the tedious work involved in constructing a diagnostic tool.  相似文献   

6.
Decisions on admissions to university and placement into university courses are usually based on the results of achievement (as in secondary school exams) and/or aptitude (in intelligence-type tests and SAT). This paper argues that in a situation where educational provision at secondary school level is highly unequal, a third approach to testing offers an alternative which is preferable both on grounds of theory of cognitive psychology and because it yields much better discrimination.The Alternative Admissions Research Project at University of Cape Town has developed a mathematics test according to the dynamic testing approach as advocated by Miller (1990) for admission of African students from grossly under-resourced schools, as well as for placing these and other students into a diversifying first year curriculum. This approach aims to assess the ability of a candidate to learn from authentic academic material within the test. This paper focuses on the reasons for the development of the mathematics test and the process by which the test questions were developed and piloted. The reliability of the test and correlations of this test with subsequent mathematical performance data are discussed.Following the encouraging data for the test as an admission mechanism, the value of the dynamic testing approach for furnishing additional information for placement into an increasingly varied curriculum at first year level was investigated. This enabled the piloting of more topics and more comprehensive validation of this type of testing. The paper concerns itself with the reliability and predictive value of each of the topics in this placement test for a range of core courses in various faculties and the extent to which these tests can identify potentially at risk students who should be placed onto an appropriate curriculum.  相似文献   

7.
8.
《教育实用测度》2013,26(3):265-283
When tests are used for high-stakes decisions, there is a strong possibility that individuals for whom an unfavorable decision is made will bring a legal suit against the developer and/or user of the test. In this article, we address the general issue of how to determine whether a test has been developed in a legally defensible manner. We discuss general legal issues, specific cases that bear on different types of test use, and some evaluative dimensions and evidence of test quality. Existing case law is based on statutory and constitutional requirements. The 1964 and 1991 Civil Rights Acts prohibit discrimination in employment. Both disparate treatment and disparate impact are issues. Most case law is based on disparate impact, which does not require evidence of discriminatory intent. If there is a showing of disparate impact, a test can still be used if it is shown to be job related and professionally developed. The U.S. Constitution's 14th Amendment requires equal treatment and due process. In testing, these requirements include a legitimate relationship between a requirement and the purpose, sufficient advance notice of a testing requirement, and opportunities for fair hearingdappeals. An examination of court cases suggests that, if tests are constructed and used according to existing standards, they should withstand legal scrutiny. Builders and users of high-stakes tests must attend to contractual arrangements and oversight of contractors, validity evidence, reliability evidence, opportunity-to-learn evidence, avoidance-of-bias evidence, and setting legally defensible cut scores.  相似文献   

9.
Equivalent circuit model-based state-of-charge (SOC) estimation has been widely studied for power lithium-ion batteries. An appropriate relaxation period to measure the open-circuit voltage (OCV) should be investigated to both ensure good SOC estimation accuracy and improve OCV test efficiency. Based on a battery circuit model, an SOC estimator in the combination of recursive least squares (RLS) and the extended Kalman filter is used to mitigate the error voltage between the measurement and real values of the battery OCV. To reduce the iterative computation complexity, a two-stage RLS approach is developed to identify the model parameters, the battery circuit of which is divided into two simple circuits. Then, the measurement values of the OCV at varying relaxation periods and three temperatures are sampled to establish the relationships between SOC and OCV for the developed SOC estimator. Lastly, dynamic stress test and federal test procedure drive cycles are used to validate the model-based SOC estimation method. Results show that the relationships between SOC and OCV at a short relaxation time, such as 5 min, can also drive the SOC estimator to produce a good performance.  相似文献   

10.
发轫于美国的标准化考试在世界范围内被大规模采用,但是,可以说标准化考试是在人们的批评声中走到今天的。为什么人们在如此强烈批评它的同时仍然一如既往地使用它?到底应该如何评价标准化考试?在分析美国标准化考试的现状、问题之后,从教育和文化的视角对标准化考试进行了审视,认为标准化考试有其存在的合理性,但在试题的设计、考试成绩的使用以及在教育评价指标中所在的权重应作到科学、客观。  相似文献   

11.
Assessment practitioners are often encouraged to adopt an “intelligent” approach to the interpretation of intelligence tests. A fundamental assumption of the “intelligent testing” philosophy is that psychometric test information (e.g., subtest g loadings) should be considered during the interpretive process. The relevant psychometric information is provided in the form of sample-based estimates. Unfortunately, the accuracy of these estimates, and the subsequent qualitative classification of intelligence subtests (e.g., good, fair, poor), are influenced to an unknown degree by sampling error. The current study demonstrated how data smoothing procedures, procedures commonly used in the development of continuous test norms, can be used to provide better estimates of the reliability, uniqueness, and general factor characteristics for the WISC-III subtests. © 1997 John Wiley & Sons, Inc.  相似文献   

12.
General conditions which should be met in the development of the idea of science processes and the potential benefits which would result are suggested. An approach to the definition of science processes based on variables and variable handling is outlined. It is argued that this should occur specifically in the context of educational practice. The processes should be elucidated within a taxonomy of tasks, the ultimate components of which should be interpretable and transferable skills. The development and utilization of such a taxonomy can most obviously occur in the context of assessment. The techniques of item banking and domain definition in domain‐referenced tests are then discussed in relation to the above issues. Multi‐dimensional labelling of an item bank is considered to be appropriate, and suggestive of a research programme for investigating the requirements of transferability and interpretability.

The labelling scheme resulting from an attempt to identify the significant dimensions of the age 15 Assessment of Performance Unit (APU) science item bank is reported. The paper indicates the effectiveness of the scheme and discusses the value of labelling those items within the assessment programme which draw heavily on the conceptual framework of taught science. A discussion of how the labelling scheme/item bank can be used in an attempt to identify the skills required in undertaking tasks based on science processes, and thus to develop tests of such identifiable skills for diagnostic and placement purposes in schools, concludes this paper.  相似文献   

13.
There is a current need for reliable and valid test instruments in different countries in order to monitor deaf children's sign language acquisition. However, very few tests are commercially available that offer strong evidence for their psychometric properties. A German Sign Language (DGS) test focusing on linguistic structures that are acquired in preschool- and school-aged children (4-8 years old) is urgently needed. Using the British Sign Language Receptive Skills Test, that has been standardized and has sound psychometric properties, as a template for adaptation thus provides a starting point for tests of a sign language that is less documented, such as DGS. This article makes a novel contribution to the field by examining linguistic, cultural, and methodological issues in the process of adapting a test from the source language to the target language. The adapted DGS test has sound psychometric properties and provides the basis for revision prior to standardization.  相似文献   

14.
目前我国大学考试中存在一些问题,诸如目标测查错位;内部考试标准不高,管理不严;考试功能未全面发挥;考试方式单一;考试中充斥着许多不公平现象等。鉴于此,我国大学考试改革应从五个方面着手;转变考试观;加强对内部考试的管理,充分发挥外部考试的功能,并处理好二的关系;积极探索新的考试管理形式,正确认识考试与素质教育的关系,使考评有机结合起来;提倡乐生教育,逐渐淡化考试。  相似文献   

15.
In optimal assembly of tests from item banks, linear programming (LP) models have proved to be very useful. Assembly by hand has become nearly impossible, but these LP techniques are able to find the best solutions, given the demands and needs of the test to be assembled and the specifics of the item bank from which it is assembled. However, sometimes even LP techniques do not offer an acceptable solution to the test assembler. Infeasibility occurs when the demands are contradictory. These contradictions may be rather complex, especially when stated in terms of LP models. Techniques are described that can solve these infeasibility problems in different manners. The objectives are twofold. First, the assembler is given a helping hand to identify the bottlenecks in the specifications of the LP model. Second, a solution is forced, such that the test assembler is always presented a test as close as possible to the original specifications. These objectives should be realizable both automatically and interactively with the test assembler.  相似文献   

16.
Speededness refers to the extent to which time limits affect examinees'test performance, and it is often measured by calculating the proportion of examinees who do not reach a certain percentage of test items. However, when tests are number-right scored (i.e., no points are subtracted for incorrect responses), examinees are likely to rapidly guess on items rather than leave them blank. Therefore, this traditional measure of speededness probably underestimates the true amount of speededness on such tests. A more accurate assessment of speededness should also reflect the tendency of examinees to rapidly guess on items as time expires. This rapid-guessing component of speededness can be estimated by modeling response times with a two-state mixture model, as demonstrated with data from a computer- administered reasoning test. Taking into account the combined effect of unreached items and rapid guessing provides a more complete measure of speededness than has previously been available.  相似文献   

17.
Conclusion Increased emphasis on school based curriculum development and assessment, with stress placed on attitudinal aims, together with the policy of the N.S.W. Schools Board that attitudes should be included in school assessment programmes, has created a major dilemma for N.S.W. science teachers. The results of this study indicate that such a dilemma can be a very real one, particularly for young teachers just out of training. Out of 23 sets of results obtained from 12 cognitive achievement tests, set by a class of 19 Diploma in Education students, only two produced a coefficient alpha reliability of the order of 0.80 and none had an alpha of 0.85 or above. In their first year of teaching, these students will be participating in test construction exercises for internal assessment purposes, where their results will be expected to discriminate between individual students. The attitude instrument developed by the class was promising, for an early stage of instrument development, producing an alpha of 0.65, with item analysis indications that it has the potential for further refinement to produce a useful instrument. However, correlations between the attitude scales and those achievement tests which had reliabilities sufficiently high to allow reasonable interpretation of results, were very low, indicating very little relationship between the attitude as measured by the scale and science achievement as measured by the cognitive tests. Obtaining a set of student results by adding scores from this instrument to results of achievement tests would be of very doubtful validity. In addition, there is the whole complex issue of the unknown degree to which respondents give socially desirable answers, when it is known that the results of such a test will be used for assessment purposes, influencing crucial decisions about their future. Analysis of the results of the attitude test by grade level showed a predictable and statistically significant upward shift in scores with increasing Grade level, except for grade 8 which had the lowest mean, but the increase in the mean between junior and senior grades was only a moderate magnitude, tentatively suggesting that the influence of five years of high school science is not a major one in developing a belief in the value of conservation of the natural environment.  相似文献   

18.
Many state and federal governments have mandated in such documents as the National Science Education Standards that inquiry strategies should be the focus of the teaching of science within school classrooms. The difficult part for success is changing teacher practices from perceived traditional ways of teaching to more inquiry‐based approaches. Arguments are often made about the effectiveness of these traditional strategies. The purpose of this study was to compare the effectiveness of the inquiry‐based approach known as the Science Writing Heuristic approach as a treatment to traditional teaching practices on students' post‐test scores in relation to students' achievement level and teacher's implementation of the approach. A mixed‐method research approach was used to analyze the teacher observational data and students' test results. The major findings of this study are that the quality of the implementation does have an impact on student performance on post‐test scores and that high‐quality implementation of the Science Writing Heuristic approach has significant advantages in closing the achievement gap within science classrooms.  相似文献   

19.
This article addresses the issue of language-related construct-irrelevant variance on content area tests from the perspective of systemic functional linguistics. We propose that the construct relevance of language used in content area assessments, and consequent claims of construct-irrelevant variance and bias, should be determined according to the degree of correspondence between language use in the assessment and language use in the educational contexts in which the content is learned and used. This can be accomplished by matching the linguistic features of an assessment and the linguistic features of the domain in which the assessment is measuring achievement. This represents a departure from previous work on the assessment of English language learners’ content knowledge that has assumed complex linguistic features are a source of construct irrelevant variance by virtue of their complexity.  相似文献   

20.
Structural equation models are typically evaluated on the basis of goodness-of-fit indexes. Despite their popularity, agreeing what value these indexes should attain to confidently decide between the acceptance and rejection of a model has been greatly debated. A recently proposed approach by means of equivalence testing has been recommended as a superior way to evaluate the goodness of fit of models. The approach has also been proposed as providing a necessary vehicle that can be used to advance the inferential nature of structural equation modeling as a confirmatory tool. The purpose of this article is to introduce readers to key ideas in equivalence testing and illustrate its use for conducting model–data fit assessments. Two confirmatory factor analysis models in which a priori specified latent variable models with known structure and tested against data are used as examples. It is advocated that whenever the goodness of fit of a model is to be assessed researchers should always examine the resulting values obtained via the equivalence testing approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号