期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Training Counselors to Work with Disabled Clients: Cognitive and Affective Components

DOUGLAS C. STROHMER DONALD A. BIGGS RICHARD F. HAASE MICHAEL J. PURCELL 《Counselor Education & Supervision》1983,23(2):132-141

This study examines the relationship of cognitive complexity, counselor anxiety, and client disability condition to accurate empathy on the part of students in training. A sample (n = 28) of students in a graduate counseling program observed a series of eight vignettes of counseling interviews (four clients with disabilities and four without disabilities) and reported a verbal counseling response to a client statement. A significant main effect was found for the cognitive complexity variable only (p < .05). A significant interaction among cognitive complexity, anxiety, and client disability condition (p < .01) indicated that all three factors interact to influence empathy. Implications for research and the training of counselors are discussed. 相似文献

2.

Moral Development and Empathy in Counseling

JAMES T. BOWMAN T. GLEN REEVES 《Counselor Education & Supervision》1987,26(4):293-298

Of interest to counselor educators are variables associated with helper empathy. The authors investigated the relationship between empathy and moral development. Students enrolled in a facilitative skills development course completed a measure of moral development before making their first counseling audiotape. After approximately 12 weeks of skills training, they were subsequently rated for their demonstration of empathic understanding to client statements on an analog videotape and on a counseling audiotape made for their course evaluation. Empathy ratings of their responses to the analog videotape correlated .61 (p < .001) with moral development scores and .35 (p < .05) with supervisory empathy ratings of their final audiotape and moral development scores. Implications for counselor education are discussed. 相似文献

3.

ESTIMATING THE RELIABILITY, VALIDITY, AND INVALIDITY OF ESSAY RATINGS

H. BLOK 《Journal of Educational Measurement》1985,22(1):41-52

In an essay rating study multiple ratings may be obtained by having different raters judge essays or by having the same rater(s) repeat the judging of essays. An important question in the analysis of essay ratings is whether multiple ratings, however obtained, may be assumed to represent the same true scores. When different raters judge the same essays only once, it is impossible to answer this question. In this study 16 raters judged 105 essays on two occasions; hence, it was possible to test assumptions about true scores within the framework of linear structural equation models. It emerged that the ratings of a given rater on the two occasions represented the same true scores. However, the ratings of different raters did not represent the same true scores. The estimated intercorrelations of the true scores of different raters ranged from .415 to .910. Parameters of the best fitting model were used to compute coefficients of reliability, validity, and invalidity. The implications of these coefficients are discussed. 相似文献

4.

Item Response Models for Local Dependence Among Multiple Ratings

Wen‐Chung Wang Chi‐Ming Su Xue‐Lan Qiu 《Journal of Educational Measurement》2014,51(3):260-280

Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning fixed‐effect parameters to describe response patterns in the bundle. Unfortunately, this model becomes difficult to manage when a polytomous item is graded by more than two raters. In this study, by adding random‐effect parameters to the facets model, we propose a class of generalized rater models to account for the local dependence among multiple ratings and intrarater variation in severity. A series of simulations was conducted with the freeware WinBUGS to evaluate parameter recovery of the new models and consequences of ignoring the local dependence or intrarater variation in severity. The results revealed a good parameter recovery when the data‐generating models were fit, and a poor estimation of parameters and test reliability when the local dependence or intrarater variation in severity was ignored. An empirical example is provided. 相似文献

5.

大学英语写作评分方法对评分者严厉程度的影响——整体评分法和分析评分法的对比分析 总被引：1，自引：0，他引：1

贺满足《湖南第一师范学报》2006,6(4):59-61,66

评分标准在写作测试中非常重要,使用不同的评分方法会影响评卷者的评分行为。研究显示,虽然整体法和分析法两种英语写作评分方法都可靠,但是在两种评分中,评卷者的严厉程度以及考生的写作成绩发生很大变化。总体上,整体法评分中,评卷者的严厉程度趋于一致,接近理想值;分析法评分中,考生的写作成绩更高,同时评卷者的严厉程度也存在显著差异。因而,在决定考生前途命运的重大考试中,整体评分法更受推崇。相似文献

6.

Conceptualizing Rater Judgments and Rating Processes for Rater‐Mediated Assessments

Jue Wang George Engelhard 《Journal of Educational Measurement》2019,56(3):582-609

Rater‐mediated assessments exhibit scoring challenges due to the involvement of human raters. The quality of human ratings largely determines the reliability, validity, and fairness of the assessment process. Our research recommends that the evaluation of ratings should be based on two aspects: a theoretical model of human judgment and an appropriate measurement model for evaluating these judgments. In rater‐mediated assessments, the underlying constructs and response processes may require the use of different rater judgment models and the application of different measurement models. We describe the use of Brunswik's lens model as an organizing theme for conceptualizing human judgments in rater‐mediated assessments. The constructs vary depending on which distal variables are identified in the lens models for the underlying rater‐mediated assessment. For example, one lens model can be developed to emphasize the measurement of student proficiency, while another lens model can stress the evaluation of rater accuracy. Next, we describe two measurement models that reflect different response processes (cumulative and unfolding) from raters: Rasch and hyperbolic cosine models. Future directions for the development and evaluation of rater‐mediated assessments are suggested. 相似文献

7.

Validating human and automated scoring of essays against “True” scores

Yoav Cohen Effi Levi Anat Ben-Simon 《教育实用测度》2018,31(3):241-250

ABSTRACT

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay’s true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By eliminating one, two, or three raters at a time, and by calculating an estimate of the true scores using the remaining raters, an independent criterion against which to judge the validity of the human raters and that of the AES system, as well as the interrater reliability was produced. The results of the study indicated that the automated scores correlate with human scores to the same degree as human raters correlate with each other. However, the findings regarding the validity of the ratings support a claim that the reliability and validity of AES diverge: although the AES scoring is, naturally, more consistent than the human ratings, it is less valid. 相似文献

8.

Evaluating Rater Accuracy in Performance Assessments 总被引：1，自引：0，他引：1

George Engelhard Jr. 《Journal of Educational Measurement》1996,33(1):56-70

A new method for evaluating rater accuracy within the context of performance assessments is described. Accuracy is defined as the match between ratings obtained from operational raters and those obtained from an expert panel on a set of benchmark, exemplar, or anchor performances. An extended Rasch measurement model called the FACETS model is presented for examining rater accuracy. The FACETS model is illustrated with 373 benchmark papers rated by 20 operational raters and an expert panel. The data are from the 1993field test of the High School Graduation Writing Test in Georgia. The data suggest that there are statistically significant differences in rater accuracy; the data also suggest that it is easier to be accurate on some benchmark papers than on others. A small example is presented to illustrate how the accuracy ordering of raters may not be invariant over different subsets of benchmarks used to evaluate accuracy. 相似文献

9.

Validating Automated Essay Scoring: A (Modest) Refinement of the “Gold Standard”

Donald E. Powers David S. Escoffery Matthew P. Duchnowski 《教育实用测度》2015,28(2):130-142

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the “gold standard.” Our objective was to refine this model and apply it to data from a major testing program and one system of automated essay scoring. The refinement capitalizes on the fact that essay raters differ in numerous ways (e.g., training and experience), any of which may affect the quality of ratings. We found that automated scores exhibited different correlations with scores awarded by experienced raters (a more compelling criterion) than with those awarded by untrained raters (a less compelling criterion). The results suggest potential for a refined machine-human agreement model that differentiates raters with respect to experience, expertise, and possibly even more salient characteristics. 相似文献

10.

Examining Differential Rater Functioning Using a Between‐Subgroup Outfit Approach

Stefanie A. Wind Stefanie S. Sebok‐Syer 《Journal of Educational Measurement》2019,56(2):217-250

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of standardized residuals). One can create plots of the standardized residuals, isolating those that resulted from raters’ ratings of particular subgroups. Practitioners can then examine the plots to identify raters who did not maintain a uniform level of severity when they assessed various subgroups (i.e., exhibited evidence of differential rater functioning). In this study, we analyzed simulated and real data to explore the utility of this between‐subgroup fit approach. We used standardized between‐subgroup outfit statistics to identify misfitting raters and the corresponding plots of their standardized residuals to determine whether there were any identifiable patterns in each rater's misfitting ratings related to subgroups. 相似文献

11.

Rating Teaching in the USA: probing the qualifications of student raters and novice teachers

John F. Newport 《Assessment & Evaluation in Higher Education》1996,21(1):17-21

In the USA, student ratings of their instructors are routinely used by administrators in higher education in making decisions regarding instructors' salary adjustments, tenure and promotion. However, when the rating qualifications of amateur student raters and novice public school teachers who have received training that should have enabled them to become qualified raters are examined closely, there are good reasons for believing that both groups of raters are not qualified to give reliable ratings on most high‐inference questionnaire items. 相似文献

12.

CLIENT 1: A Computer Simulation for Use in Counselor Education and Research

JAMES W. LICHTENBERG THOMAS J. HUMMEL WARREN F. SHAFFER 《Counselor Education & Supervision》1984,24(2):155-167

CLIENT 1 is an interactive program that was designed to simulate client behavior in an initial interview and to provide a standardized environment for training and research on counselor problem-solving strategies. Through interaction with the computerized client, counselors attempt to facilitate client movement toward the goal of verbalizing a specific problem statement. Client movement is a function of the appropriateness and accuracy of counselor statements, the threat value associated with both client and counselor statements, the strength of the relationship between the counselor and client, and an index of counselor expertise. The uses of the simulation in counselor education and research are discussed. 相似文献

13.

PETS口试评分培训效果的多面Rasch分析

李英关丹丹《外语教学理论与实践》2016,153(3):43-48

本研究以PETS-1级拟聘口试教师为研究对象,对口试教师评分的培训效果进行了研究。采用多面Rasch分析对比口试教师接受培训前后的评分效果。结果发现：培训后,提升了口试教师与专家评分完全一致的比率,评分偏于严格的口试教师在评分标准上做了恰当的调整,所有口试教师评分拟合值都在可接受范围内,总体上,口试教师评分的培训比较有效,培训后提升了评分的准确性。多面Rasch分析有助于发现评分过于宽松、过于严格、评分拟合差的口试教师以及评分异常情况,为开展有针对性地培训提供了可靠的依据。相似文献

14.

ALTERNATIVES IN AURAL REHABILITATION: PROVIDER TRAINING OF NONAUDIOLOGISTS IN THE DELIVERY OF HEARING‐AID SUPPORTIVE SERVICES TO OLDER PERSONS WITH HEARING LOSS

Karen Patterson Jess Dancer 《Educational gerontology》2013,39(6):487-495

Recently, Patterson and Dancer (1987) suggested a model wherein persons who normally come in contact with older hearing‐impaired persons can be trained to assist the older hearing‐aid user in adjustment to amplification. Their four‐phase educational model offers an alternative to traditional aural rehabilitation programs by using personnel from senior centers, nursing homes, and state and local agencies as program providers.

The present article elaborates more fully on the training that protocol providers will receive from audiologists certified by the American Speech‐Language‐Hearing Association. Providers will be carried through five stages: empathy, effective communication skills, knowledge of the interaction of aging and hearing loss, the phases outlined in the Patterson and Dancer model, and guidelines for referrals. Objective‐based provider and client response criteria are outlined for moving the client from the initial receipt of the hearing aid to its ultimate acceptance and use on a doily basis. 相似文献

15.

口语评分中评分员对评分标准的理解和使用——配对口试评分的报告分析

史天化唐国平《鸡西大学学报》2012,12(6):33-34,44

通过有声思维实验方法并辅以刺激回忆,收集四名不同性格倾向的评分员在配对口语考试评分时进行的思维报告数据,定性分析结果表明：在实际评分中,评分员对评分量表的理解和使用存在很大的差异性,具体表现在：（1）外向的评分员在评分过程中,表现的比内向的评分员更为宽容;（2）内向的评分员更多地关注评分量表中的各项具体指标和标准,而外向的评分员强调任务的完成状况和考生之间的比较、交流,和互动;（3）外向的评分员比内向的评分员更少地依赖评分量表,更多地使用非语言的特征。本研究结果对考试评分标准的修订和评分员培训均有启示。相似文献

16.

The Stability of Rater Severity in Large-Scale Assessment Programs

Peter J. Congdon Joy MeQueen 《Journal of Educational Measurement》2000,37(2):163-178

The purpose of this study was to investigate the stability of rater severity over an extended rating period. Multifaceted Rasch analysis was applied to ratings of 16 raters on writing performances of 8, 285 elementary school students. Each performance was rated by two trained raters over a period of seven rating days. Performances rated on the first day were re-rated at the end of the rating period. Statistically significant differences between raters were found within each day and in all days combined. Daily estimates of the relative severity of individual raters were found to differ significantly from single, on-average estimates for the whole rating period. For 10 raters, severity estimates on the last day were significantly different from estimates on the first day. These fndings cast doubt on the practice of using a single calibration of rater severity as the basis for adjustment of person measures. 相似文献

17.

What play therapists do within the therapeutic relationship of humanistic/non-directive play therapy

Sally Robinson 《Pastoral Care in Education》2013,31(3):207-220

Play therapists are increasingly being employed in schools, yet there is confusion among many health, education and social care practitioners about the role of play therapists. This paper explains how play therapists position themselves and what they do through an examination of the therapeutic relationship between the therapist and child. It discusses the core conditions of congruence, acceptance and empathy with reference to recent research. Play therapists vary their practice in terms of verbal or non-verbal interaction, the tools in their playroom and how they physically place themselves. This paper argues for placing an emphasis on the non-verbal mirroring of the child, the incorporation of expressive media such as paint, clay and sand into the play room and the positioning of the therapist within the play space. 相似文献

18.

Exploring the Influence of Range Restrictions on Connectivity in Sparse Assessment Networks: An Illustration and Exploration Within the Context of Classroom Observations

下载免费PDF全文

Stefanie A. Wind Eli Jones 《Journal of Educational Measurement》2018,55(2):217-242

Range restrictions, or raters’ tendency to limit their ratings to a subset of available rating scale categories, are well documented in large‐scale teacher evaluation systems based on principal observations. When these restrictions occur, the ratings observed during operational teacher evaluations are limited to a subset of the available categories. However, range restrictions are less common within teacher performances that are used to establish links (anchor ratings) in otherwise disconnected assessment systems. As a result, principals’ category use may be different between anchor ratings and operational ratings. The purpose of this study is to explore the consequences of discrepancies in rating scale category use across operational and anchor ratings within the context of teacher evaluation systems based on principal observations. First, we used real data to illustrate the presence of range restriction in operational ratings, and the effect of this restriction on connectivity. Then, we used simulated data to explore these effects using experimental manipulation. Results suggested that discrepancies in range restriction between anchor and operational ratings do not systematically impact the precision of teacher, principal, and teaching practice estimates. We discuss the implications of these results in terms of research and practice for teacher evaluation systems. 相似文献

19.

基于CTT、GT、IRT的评分者信度研究——以某届奥运会女子跳水决赛为例

钟晓玲康春花陈婧《考试研究》2013,(5):41-52

本文以某届国际奥林匹克运动会女子跳水决赛为例,综合应用CTT、GT和IRT三大测量理论进行评分者信度分析,从不同角度揭示评分者之间和评分者内部的差异情况。结果表明:CTT的评分者信度分别为0.981和078;GT的概化系数和可靠性指数分别为0.8279和0.8271,比赛中所采用的7名评委分别对选手在5轮上的跳水表现进行评定的决策是比较适宜的决策;在IRT中,相对而言,评委5在7名评委中最为严厉,评委2最为宽松,但评委之间在宽严程度上的差异不显著,评委1和评委4在自身一致性上存在问题,不同评委在评定不同选手、不同难度系数动作和不同轮数上存在偏差,但未达到显著性水平。基于本文的分析,可以了解三种评分者信度分析方法的特点及各自优势,为评分者培训和提高评分信度提供有用信息。相似文献

20.

CET-4作文评分人评分标准使用情况的研究

徐鹰《浙江教育学院学报》2014,(2):39-46,93

本研究采用混合研究法对CET-4作文评分人如何使用评分标准进行分析。26位CET-4作文评分人对30篇CET-4模拟作文评分,并提供3条按重要性排序的评分理由。研究结果显示：（1）虽然存在严厉度的差异,但是26位评分人之间的一致性比较好,且大部分评分人的自身一致性也较好。（2）部分评分人的评分理由呈现了单一化趋势。（3）评分人所给评分理由的71.91%体现了CET-4作文评分标准所规定的5个文本特征,说明大部分评分人对标准的理解和把握还是比较准确的。相似文献