首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
Examining committees often need to reach a compromise between absolute and relative standards. Unfortunately, the way in which the compromise is achieved is usually unclear. This paper proposes a systematic method for reaching a compromise. In this method, the estimated passing score (level of minimum knowledge) is assumed to be related to the expected pass rate (percentage of successful candidates) through a simple linear function. The examination results define a function relating the percentage of candidates who would be successful given a specified passing score to the passing score. The intersection of both functions gives the required compromise.  相似文献   

2.
考试标准的设定是一项系统工程,本文介绍了选择切点的六个步骤和四种比较常用的方法。六个步骤是确定标准的类型、确定设定标准的方法、选择评判者、举行设定标准的会议、计算标准和确定以后要做的工作。四种比较常用的方法是相对的方法、以试题评判为依据的绝对的方法、以应试者个人评判为基础的绝对的方法和折衷的方法。  相似文献   

3.
This paper revisits the use of effect sizes in the analysis of experimental and similar results, and reminds readers of the relative advantages of the mean absolute deviation as a measure of variation, as opposed to the more complex standard deviation. The mean absolute deviation is easier to use and understand, and more tolerant of extreme values. The paper then proposes the use of an easy to comprehend effect size based on the mean difference between treatment groups, divided by the mean absolute deviation of all scores. Using a simulation based on 1656 randomised controlled trials each with 100 cases, and a before and after design, the paper shows that the substantive findings from any such trial would be the same whether raw-score differences, a traditional effect size like Cohen's d, or the mean absolute deviation effect size is used. The same would be true for any comparison, whether for a trial or a simpler cross-sectional design. It seems that there is a clear choice over which effect size to use. The main advantage in using raw scores as an outcome measure is that they are easy to comprehend. However, they might be misleading and so perhaps require more judgement to interpret than traditional ‘effect’ sizes. Among the advantages of using the mean absolute deviation effect size are its relative simplicity, everyday meaning, and the lack of distortion of extreme scores caused by the squaring involved in computing the standard deviation. Given that working with absolute values is no longer the barrier to computation that it apparently was before the advent of digital calculators, there is a clear place for the mean absolute deviation effect size (termed ‘A’).  相似文献   

4.
This paper proposes a framework for assessing the quality of education, based on the outcomes defined in educational standards. The author takes the view that educational standards reflect the mission that schools must fulfill. He explains that the classroom curriculum, derived from educational standards, must customize the learning process to respond to the teaching–learning environment. Defining quality as the extent to which the delivery of the school curriculum realises the learning outcomes defined in the educational standards, the author proposes that quality in education should be evaluated using two approaches: relative achievement assessment and absolute achievement assessment. In elaborating on the two dimensions of assessment on which the model is based, the author highlights the need for greater attention to be paid to values and attitudes in assessing quality of education.  相似文献   

5.
In this article we address the issue of consistency in standard setting in the context of an augmented state testing program. Information gained from the external NRT scores is used to help make an informed decision on the determination of cut scores on the state test. The consistency of cut scores on the CRT across grades is maintained by forcing a consistency model based on the NRT scores and translating that information back to the CRT scores. The inconsistency of standards and the application of this model are illustrated using data from the Maryland MSA large state testing program involving cut points for basic, proficient and advanced in mathematics and reading across years and across grades. The model is discussed in some detail and shown to be a promising approach, although not without assumptions that must be made and issues that might be raised.  相似文献   

6.
Cronbach made the point that for validity arguments to be convincing to diverse stakeholders, they need to be based on assumptions that are credible to these stakeholders. The interpretations and uses of high-stakes test scores rely on a number of policy assumptions about what should be taught in schools, and more specifically, about the content standards and performance standards that should be applied to students and schools. For example, a high-school graduation test can be developed as a measure of readiness for the world of work, for college, or for citizenship and the activities of daily life. The assumptions built into the assessment need to be subjected to scrutiny and criticism if a strong case is to be made for the validity of the proposed interpretation and use.  相似文献   

7.
There is a need for assessment of teachers' competencies fostered by a growing attention given to accountability and quality improvement. Important questions are how good the demonstrated competencies of teachers should be for a satisfying assessment and how the different competencies should be weighted. Using a policy capturing method, in two rounds, nine stakeholders developed performance standards (or cut-off scores) for teacher assessment on eight criteria (or content standards) that resulted from an earlier study. Between the rounds, the panellists held a structured group discussion. Policy capturing proved to be a clear and useful method generating consistent judgements that can be described according to both a compensatory model and a conjunctive model. From the first to the second round, the consistency increased. However, while the panellists agreed to a substantial degree on the performance standards, they disagreed on the weights to be assigned to the criteria.  相似文献   

8.
This paper assesses the value that can be put on the reading (National Curriculum En 2) Standard Assessment Task scores as indicators of what children are achieving in reading and whether reading standards are rising. The results of a cross-sectional study of a sample of all the Year 2 children (171 in 1991; 171 in 1992) from five randomly selected primary schools within one Local Education Authority (LEA) are presented. Pupils’ scores on The Primary Reading Test (PRT) (France, 1981) and the reading Standard Assessment Task score elicited by them in the previous half term are compared. Results show an improvement in the attainment level of children in 1992 compared to those in 1991 on Standard Assessment Tasks with a higher percentage achieving Level 3 and fewer on Level 1. However, examination of the means for each year group indicates that the mean PRT score for each Standard Assessment Task level is significantly lower in 1992 than 1991. Conclusions, based on such a small study, are tentative. However, it would seem that there is a need to view apparently rising standards, as measured solely by the Standard Assessment Task results, with a degree of caution.  相似文献   

9.
A random sample of 55 WRAT-R protocols, completed by nine practitioners for a metropolitan school district in the South, was analyzed for examiner errors. All nine practitioners made errors, which occurred on 95% of the protocols and averaged 3.0 errors per protocol. The most frequent errors included failures to obtain a correct ceiling or basal, and failures to record examinees' responses. Correction of the examiner errors resulted in changes in 11 standard scores, and 3 additional changes in grade equivalent scores. These results indicate that WRAT-R administration and scoring are not as objective as assumed by the test developers, and that examiner errors on this test can adversely affect diagnostic decisions.  相似文献   

10.
Many education policies require estimating whether students in different grades are on track for achieving certain educational standards. One approach for constructing these cut scores is to estimate the values on tests that predict reaching targets on subsequent tests. Whether a student is deemed on target can affect the student’s course counselling and aggregate statistics can affect school closures and funding and teacher employment. Seven different regression procedures for estimating cut scores are compared with 15 different data scenarios. In some situations, all the methods provided fairly accurate estimates, but in other situations, some estimates were poor. The choice of which regression procedure to use can make a difference. Overall, a method based on a loess regression performed well.  相似文献   

11.
Scores on state standards‐based assessments are readily available and may be an appropriate alternative to traditional placement tests for assigning or accepting students into particular courses. Many community colleges do not require test scores for admissions purposes but do require some kind of placement scores for first‐year English and math courses. In this study, we examine the efficacy of using the reading and math portions of the Kansas State Assessment (KSA) for predicting the success of high school students taking College Algebra and College English I at a Kansas community college. Results showed that in this sample KSA scores predicted as well or better than more traditional placement tests and with no extra cost to the institution.  相似文献   

12.
The aim of the National Numeracy Strategy is to raise standards in numeracy. Strong evidence for its success has, however, been lacking: most of the available data come from performance on National Test assessments administered in schools or from Ofsted reports, and is vulnerable to suggestions of bias. An opportunistic analysis of data from a population cohort study extending over three school years compares school‐based scores at school entry and at age 7–8 with clinic‐based scores on similar tests. The results show a small but statistically significant rise between 1998 and 1999 and between 1998 and 2000 in scores on both KS1 arithmetic SATs taken in schools and the arithmetic component of the WISC test taken in an independent research clinic. This is evidence for a real rise in generalised arithmetic ability over this period which may be attributable to the children's experience of the National Numeracy Strategy.  相似文献   

13.
Eighty male rats were tested in an open field. Correlation coefficients between aggregated test days were larger than those between nonaggregated test days, indicating that aggregation across days can enhance the reliability of scores in the open-field test. Also, absolute values of correlation coefficients among the seven open-field test measures based on the aggregated data were generally larger than those based on nonaggregated data, indicating that the correlation among measures may be closer than previously assumed on the basis of nonaggregated data. Issues concerning appropriate aggregation and limitations of aggregation are discussed. The technique of aggregation is recommended as a routine procedure in the analysis of open-field test results, because of the enhanced reliability obtained.  相似文献   

14.
Validity evidence based on test content is critical to meaningful interpretation of test scores. Within high-stakes testing and accountability frameworks, content-related validity evidence is typically gathered via alignment studies, with panels of experts providing qualitative judgments on the degree to which test items align with the representative content standards. Various summary statistics are then calculated (e.g., categorical concurrence, balance of representation) to aid in decision-making. In this paper, we propose an alternative approach for gathering content-related validity evidence that capitalizes on the overlap in vocabulary used in test items and the corresponding content standards, which we define as textual congruence. We use a text-based, machine learning model, specifically topic modeling, to identify clusters of related content within the standards. This model then serves as the basis from which items are evaluated. We illustrate our method by building a model from the Next Generation Science Standards, with textual congruence evaluated against items within the Oregon statewide alternate assessment. We discuss the utility of this approach as a source of triangulating and diagnostic information and show how visualizations can be used to evaluate the overall coverage of the content standards across the test items.  相似文献   

15.
16.
OBJECTIVE: The aims of the study were to: determine the attitudes of parents, pediatric residents, and medical students from a Turkish population toward childhood disciplinary methods; ascertain the association of participants' abusive childhood history with their attitudes toward discipline; and assess their attitudes about disciplinary actions, which should be reported as abuse. METHOD: A cross-sectional survey was conducted in Ankara University School of Medicine, Department of Social Pediatrics. Sixty-five parents, 39 pediatric residents, and 106 medical students completed a questionnaire (Survey of Standards of Discipline). This questionnaire was designed to measure sociodemographic characteristics, attitudes toward childhood disciplinary practices, and abusive childhood experiences. There were 43 different disciplinary acts in this questionnaire. The participants were expected to give responses to these acts in three categories: (a) acceptable as discipline; (b) unacceptable as discipline; and (c) unacceptable as discipline-would report to authorities as child abuse. Based on the responses to this questionnaire, we developed the Severity Scale. Using this scale, physical severity scores, verbal severity scores, and total severity scores were measured for each participant. RESULTS: None of the participants accepted life-threatening practices as discipline, but some declared certain abusive disciplinary practices as acceptable. Some forceful disciplinary methods were not considered as reportable by participants. All severity scores of both residents and students were found to be higher than those of the parents (for verbal severity scores p=.042). Also, both verbal and physical severity scores of parents with one child were higher than those of parents with two children (for verbal severity scores p=.044). Ninety-one participants (43.3%) indicated that beating was an acceptable form of discipline. Of parents, 66.9% reported abusive childhood history by their own criteria. Of medical students with an abusive childhood experience, 56.5% accepted beating as appropriate (p=.001). Both verbal and physical severity scores were found to be higher in participants with abusive childhood history. CONCLUSIONS: Abusive childhood history and lack of education regarding appropriate discipline techniques are linked to the acceptance of certain physical discipline practices. Turkey's cultural and traditional norms may be associated with the use of physical punishment, and in some cases, physical abuse. The lack of awareness of abusive discipline methods among physicians constitutes problems for child protection and must be addressed. Thus, educational programs on child disciplinary practices are required to provide an increased awareness of child abuse among health professional trainees and parents in Turkey.  相似文献   

17.
We developed a criterion-referenced student rating of instruction (SRI) to facilitate formative assessment of teaching. It involves four dimensions of teaching quality that are grounded in current instructional design principles: Organization and structure, Assessment and feedback, Personal interactions, and Academic rigor. Using item response theory and Wright mapping methods, we describe teaching characteristics at various points along the latent continuum for each scale. These maps enable criterion-referenced score interpretation by making an explicit connection between test performance and the theoretical framework. We explain the way our Wright maps can be used to enhance an instructor’s ability to interpret scores and identify ways to refine teaching. Although our work is aimed at improving score interpretation, a criterion-referenced test is not immune to factors that may bias test scores. The literature on SRIs is filled with research on factors unrelated to teaching that may bias scores. Therefore, we also used multilevel models to evaluate the extent to which student and course characteristic may affect scores and compromise score interpretation. Results indicated that student anger and the interaction between student gender and instructor gender are significant effects that account for a small amount of variance in SRI scores. All things considered, our criterion-referenced approach to SRIs is a viable way to describe teaching quality and help instructors refine pedagogy and facilitate course development.  相似文献   

18.
对不同类型学校的774名有效被试实施数学学业成就水平测试,并应用IRT参数模型方法进行分析,得出四点判断:(1)测验分数、最优分数呈负偏态分布;(2)测验信息函数负向偏移,大体呈现双峰波形;(3)主观性试题与逻辑斯蒂模型的拟合性较差;(4)不同类型学校学生的数学学业成就水平存在显著性差异。  相似文献   

19.
This study investigates empirically the mechanisms behind the increasing grade point averages in Swedish upper secondary schools. Four hypotheses are presented as plausible explanations; improved student achievements, student selection effects, strategic behaviour in course choices, and lowering of grading standards. The analysis is based on extensive data, and focuses on grades and test scores from upper secondary school graduates over a 6‐year period. The result shows that the increase in grade point averages cannot be explained by better achievements, selection effects or course choices, which means that standards have been lowered, which is interpreted here as grade inflation. The grade inflation is most likely an effect of the leniency in the grading system in combination with pressure for high grading, related to the upper secondary school grades’ function as an instrument for selection to higher education.  相似文献   

20.
智力可以分为学业智力和实践智力,其中,学业智力与学业问题解决有关,主要是通过非社会认知操作表现出来;实践智力则与日常性或职业性问题解决相连,更多地通过社会认知操作表现出来。智力评估的指标可以分为两个方面:认知操作测试总分和认知方式。认知操作测试总分是智力的最大化评估指标,是个体以不同认知操作对不同认知材料进行信息加工时的绝对水平。认知方式是智力的偏向性评估指标,是个体以不同认知操作对不同认知材料进行信息加工时的相对优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号