首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new probability-based standard setting technique, the Objective Borderline Method (OBM), was introduced recently. This was based on a mathematical model of how test scores relate to student ability. The present study refined the model and tested it using 2500 simulated data-sets. The OBM was feasible to use. On average, the OBM performed well with specificity .88, sensitivity .51, false positive rate 3.4% and false negative rate 26%. These indices were insensitive to the borderline score range. This probability-based standard setting may be a useful addition to the range of standard setting methods available.  相似文献   

2.
Performance assessments, scenario‐based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this violation on examinee ability estimates has been comparatively neglected. It is known that such item dependencies cause low‐ability examinees to have their scores overestimated and high‐ability examinees' scores underestimated. However, the impact of these biases on examinee classification decisions has been little examined. In addition, because the influence of these dependencies varies along the underlying ability continuum, whether or not the location of the cut‐point is important in regard to correct classifications remains unanswered. This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers.  相似文献   

3.
随着各个在线平台用户生成内容的普及,在线评论对品牌方、消费者、平台的影响越来越大,针对在线评论文本情感分析的技术成为研究热点。与此同时,为提升自身信誉或诋毁竞争对手的虚假评论泛滥,对消费者的购买决策及产品的网络口碑带来不利影响。使用文献研究、案例分析等方法,以虚假评论识别方法研究为基础,探讨其文本情感特点,总结出当前多是通过情感极性的角度来识别虚假评论,并对未来有针对性地构建更合理完善的情感词典及对新媒体平台的虚假评论识别进行展望。  相似文献   

4.
A model linking 3 perceived support variables, namely, level of support, quality of support (unconditional or conditional), and hope about future support, to false self behavior (acting in ways that are not the "real me") was hypothesized. Both parent and peer support were examined. The best fitting model for the parent and peer data revealed that perceived quality and level of parent support predict hope about future parent support, which in turn predicts false self behavior. Adolescents' motives for engaging in false self behavior were also examined. Those whose reported motives were hypothesized to be the most clinically debilitating (devaluation of the self) reported the most negative outcomes (depressed affect, low self-worth, hopelessness, and less knowledge of the true self). In contrast, adolescents citing the developmentally normative motive of role experimentation reported the most positive affect, highest self-worth, greatest hopefulness, and most knowledge of true self. Those reporting that they engaged in false self behavior to please, impress, or win the approval of parents and peers had intermediate scores on the depression, self-worth, hope, and knowledge of true self measures. Discussion focused on the potential causes and consequences of false self behavior.  相似文献   

5.
Several studies have shown that the standard error of measurement (SEM) can be used as an additional “safety net” to reduce the frequency of false‐positive or false‐negative student grading classifications. Practical examinations in clinical anatomy are often used as diagnostic tests to admit students to course final examinations. The aim of this study was to explore the diagnostic value of SEM using the likelihood ratio (LR) in establishing decisions about students with practical examination scores at or below the pass/fail cutoff score in a clinical anatomy course. Two hundred sixty‐seven students took three clinical anatomy practical examinations in 2011. The students were asked to identify 40 anatomical structures in images and prosected specimens in the practical examination. Practical examination scores were then divided according to the following cutoff scores: 2, 1 SEM below, and 0, 1, 2 SEM above the pass score. The positive predictive value (+PV) and LR of passing the final examination were estimated for each category to explore the diagnostic value of practical examination scores. The +PV (LR) in the six categories defined by the SEM was 39.1% (0.08), 70.0% (0.30), 88.9% (1.04), 91.7% (1.43), 95.8% (3.00), and 97.8% (5.74), respectively. The LR of categories 2 SEM above/below the pass score generated a moderate/large shift in the pre‐ to post‐test probability of passing. The LR increased the usefulness and practical value of SEM by improving confidence in decisions about the progress of students with borderline scores 2 SEM above/below the pass score in practical examinations in clinical anatomy courses. Anat Sci Educ. © 2013 American Association of Anatomists.  相似文献   

6.
信息失真与决策失误——关于“大跃进”运动的再考察   总被引:1,自引:0,他引:1  
“天跃进”悲剧产生的一个重要原因就是此间决策者赖以决策的信息本身包含着大量虚假的、片面的材料,决策中心用这些材料来判断形势并做出决策,必然是不正确的。而真实地反映决策实施情况的反馈信息严重缺位,使得决策中心在形势日益严重的时候依然对形势持有过于乐观的估计,没有厦时地修正、更改错误的决策,从而导致悲剧性后果的产生。“大跃进”中决策信息与反馈信息的流动过程表明,信息失真才是毛泽东等领导人在“大跃进”中决策失误的最主要原因。  相似文献   

7.
Abstract

With the ultimate goal of providing safe, high-quality experiential educational opportunities, decision making on the part of the outdoor instructor has become a critical component in successful programming. Within the outdoor pursuits setting, decisions can be categorized by specific situations and by the person or group affected by the decisions. In addition, decisions can be classified according to the frequency and severity of the consequences of a wrong choice. Correct decision making can be hindered by a variety of situations, such as stress and adversity, which are often present in the outdoor setting. Despite these problems, a number of techniques, such as consensus decision making, can aid the outdoor instructor in making correct decisions.  相似文献   

8.
Evidence to support the credibility of standard setting procedures is a critical part of the validity argument for decisions made based on tests that are used for classification. One area in which there has been limited empirical study is the impact of standard setting judge selection on the resulting cut score. One important issue related to judge selection is whether the extent of judges’ content knowledge impacts their perceptions of the probability that a minimally proficient examinee will answer the item correctly. The present article reports on two studies conducted in the context of Angoff‐style standard setting for medical licensing examinations. In the first study, content experts answered and subsequently provided Angoff judgments for a set of test items. After accounting for perceived item difficulty and judge stringency, answering the item correctly accounted for a significant (and potentially important) impact on expert judgment. The second study examined whether providing the correct answer to the judges would result in a similar effect to that associated with knowing the correct answer. The results suggested that providing the correct answer did not impact judgments. These results have important implications for the validity of standard setting outcomes in general and on judge recruitment specifically.  相似文献   

9.
This article introduces the Diagnostic Profiles (DP) standard setting method for setting a performance standard on a test developed from a cognitive diagnostic model (CDM), the outcome of which is a profile of mastered and not‐mastered skills or attributes rather than a single test score. In the DP method, the key judgment task for panelists is a decision on whether or not individual cognitive skill profiles meet the performance standard. A randomized experiment was carried out in which secondary mathematics teachers were randomly assigned to either the DP method or the modified Angoff method. The standard setting methods were applied to a test of student readiness to enter high school algebra (Algebra I). While the DP profile judgments were perceived to be more difficult than the Angoff item judgments, there was a high degree of agreement among the panelists for most of the profiles. In order to compare the methods, cut scores were generated from the DP method. The results of the DP group were comparable to the Angoff group, with less cut score variability in the DP group. The DP method shows promise for testing situations in which diagnostic information is needed about examinees and where that information needs to be linked to a performance standard.  相似文献   

10.
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard‐setting panels should have the proper qualifications to make the judgments asked of them; however, even qualified judges vary in expertise and in some cases, such as highly specialized areas or when members of the public are involved, it may be difficult to ensure that each member of a standard‐setting panel has the requisite expertise to make qualified judgments. Given the subjective nature of these types of judgments, and that a large part of the validity argument for an exam lies in the robustness of its passing standard, an examination of the influence of judge proficiency on the judgments is warranted. This study explores the use of the many‐facet Rasch model as a method for adjusting modified Angoff standard‐setting ratings based on judges’ proficiency levels. The results suggest differences in the severity and quality of standard‐setting judgments across levels of judge proficiency, such that judges who answered easy items incorrectly tended to perceive them as easier, but those who answered correctly tended to provide ratings within normal stochastic limits.  相似文献   

11.
This real‐data‐guided simulation study systematically evaluated the decision accuracy of complex decision rules combining multiple tests within different realistic curricula. Specifically, complex decision rules combining conjunctive aspects and compensatory aspects were evaluated. A conjunctive aspect requires a minimum level of performance, whereas a compensatory aspect requires an average level of performance. Simulations were performed to obtain students' true and observed score distributions and to manipulate several factors relevant to a higher education curriculum in practice. The results showed that the decision accuracy depends on the conjunctive (required minimum grade) and compensatory (required grade point average) aspects and their combination. Overall, within a complex compensatory decision rule the false negative rate is lower and the false positive rate higher compared to a conjunctive decision rule. For a conjunctive decision rule the reverse is true. Which rule is more accurate also depends on the average test reliability, average test correlation, and the number of reexaminations. This comparison highlights the importance of evaluating decision accuracy in high‐stake decisions, considering both the specific rule as well as the selected measures.  相似文献   

12.
Standard setting is defined as the identification of certain points on a mark scale with particular performance standards, with the intention of enhancing the inferences that are warranted from the test scores. It is argued that the selection of both the points on the mark‐scales and the performance standards with which they are equated are arbitrary and are driven by a set of values (which are often implicit). In ‘high‐stakes’ settings, it is shown how the values implicit in the standard can come to dominate the values inherent in the domain they represent. The validation of standards must therefore include consideration of their consequences as well as their meanings. It is then argued that standards, where they exist, cannot be accounted for purely in terms of norm‐referenced or criterion‐referenced interpretations, but exist rather by virtue of a shared construct in a community of practice. These theoretical positions are then developed to classify standard‐setting methods along two dimensions, the first relating to the role of performance data in the setting of standards and the second relating to the extent to which the meanings or the consequences of the assessment are emphasised in the process.  相似文献   

13.
Treating suicidality is one of the most challenging situations managed by college and university counseling centers. The first edition of Bongar’s (1991) The Suicidal Patient: Clinical and Legal Standards of Care, a compendium of empirical knowledge and clinical research regarding standard of care in the treatment of suicidality, was soon considered a valuable resource in the field. The volume was pivotal in the arena of clinical practice because, in addition to compiling state of the art information, it also positioned itself as a benchmark for determining whether standard of care has been met in treatment of suicidal people. Now in its third edition (Bongar & Sullivan, 2013), this resource continues to inform clinical practice. This article examines several noteworthy changes across the three editions of this seminal work. What has changed, possible trends suggested by the changes, and resulting implications of these changes are examined. Of particular importance are changes that represent significant developments in the field or make relatively new assertions of what constitutes sound practice. These trends, changes, and assertions regarding standard of care are discussed in terms of their relevance to the counseling center setting. In the third edition there are several instances in which Bongar and Sullivan (2013) clearly extended beyond reporting observations about clinical practice and arguably moved into the realm of attempting to set norms for standard of care. Even though these new assertions may essentially be the authors’ opinions, they may be treated as fact by regulators, expert witnesses, attorneys, and others in determining whether standard of care has been met in specific cases.  相似文献   

14.
This paper aims to investigate students’ likes and dislikes of the teaching that they have experienced and its effects on students’ perceptions of the learning environment, student learning and academic performance. The study compares a lecture-based setting to a student-activating learning/teaching environment, considering both instructional and assessment practices. Data (N=578) were collected using the Course Experience Questionnaire (Ramsden, 1991) and by means of a standardised test. While lecture-taught students’ evaluations of the experienced teaching were generally focused and positive, students’ perceptions of the activating methods varied widely and both extremely positive and negative opinions were present. Also the consequences of these (dis)likes in instruction for student learning become clear. Moreover, a significant positive linear effect of students’ (dis)likes in instruction on students’ perceptions of the learning environment (except for appropriate assessment), their learning and their performance was found. This way, the results pinpoint the central role of teaching methods for students’ learning and caution against detrimental consequences of students’ negative appraisal of the teaching methods that they experience. A matching strategy between a student’s teaching tastes and the teacher’s instructional interventions provides the best educational prospects.  相似文献   

15.
As an alternative to adaptation, tests may also be developed simultaneously in multiple languages. Although the items on such tests could vary substantially, scores from these tests may be used to make the same types of decisions about different groups of examinees. The ability to make such decisions is contingent upon setting performance standards for each exam that allow for comparable interpretations of test results. This article describes a standard setting process used for a multilingual high school literacy assessment constructed under these conditions. This methodology was designed to address the specific challenges presented by this testing program including maintaining equivalent expectations for performance across different student populations. The validity evidence collected to support the methodology and results is discussed along with recommendations for future practice.  相似文献   

16.
Although it is arguably a fundamental democratic or human right of a child to feel safe at school, many children and adolescents have to face peer victimisation in schools on a daily basis, and occasionally through several levels of education. Long-term victimisation may have detrimental consequences for the victim, including a negative effect on educational attainment. This study provides an insight into the lives of five young people who have dropped out or are at risk of dropping out from Estonian vocational schools because of peer victimisation. The study is based on in-depth face-to-face personal interviews. Four superordinate themes with associated subthemes are addressed: ‘experience of victimisation’, ‘social context’, ‘lack of support’, and ‘quitting as a survival strategy’. The stories of the bullying victims reveal how the victimisation has shaped them and their educational pathways by compelling them to discontinue their vocational training.  相似文献   

17.
人工免疫中的self-nonself识别模型存在识别局限性。危险理论以危险作为识别对象,可有效避免self-nonself识别中难以解决的问题。本文将nonself的出现也作为危险信号的一种,与其他危险信号协同激发免疫响应,取代传统的由nonself激发免疫响应的策略,达到降低伪肯定率和伪否定率的目的。  相似文献   

18.
《Educational Assessment》2013,18(3):129-145
Alternate approaches to standard setting cannot be evaluated in terms of their accuracy, because the standard does not exist until we set it. To set a standard is to establish a policy, and policies are evaluated in terms of their appropriateness, reasonableness, and consistency, rather than in terms of accuracy. Of the 2 general approaches to standard setting currently in use. the test-centered methods rely on judgments about test items, whereas the examinee-centered methods rely on judgments about examinees. This article examines criteria for choosing between these 2 approaches to standard setting in terms of empirical criteria and in terms of whether the method is consistent with (a) the model of achievement underlying test design and interpretation and (b) the assessment methods being used.  相似文献   

19.
网络民意具有许多传统民意表达所不具备的优势和特点。网络民意在对推动和谐社会建设具有积极意义的同时,也出现了诸如虚假信息、非理性的网络暴力、代表性不全面、导向性错误等不容忽视的负面影响。构建健康的、规范的网络民意良性运作平台,网站网坛工作人员可以采取相应的措施加以规范和引导。  相似文献   

20.
This research investigated 4- through 7-year-olds' and adults' (n = 64) concepts about the emotional consequences of desire fulfillment versus desire inhibition in situations where people's desires conflict with prohibitive rules. Results revealed developmental increases in attributing positive or mixed emotions to story characters that make willpower decisions and negative or mixed emotions to characters that transgress. These developmental changes in emotion predictions were accompanied by age-related differences in emotion explanations. Whereas 4- and 5-year-olds largely explained emotions in relation to the characters' goals, 7-year-olds and adults further explained how rules and future consequences influence emotions. Results are discussed in relation to connections among children's psychological, deontic, and future-oriented reasoning about emotions as well as the development of self-control.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号