首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on ‘corrected AIC’ to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17–51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.  相似文献   

2.
The increasing volume of textual information on any topic requires its compression to allow humans to digest it. This implies detecting the most important information and condensing it. These challenges have led to new developments in the area of Natural Language Processing (NLP) and Information Retrieval (IR) such as narrative summarization and evaluation methodologies for narrative extraction. Despite some progress over recent years with several solutions for information extraction and text summarization, the problems of generating consistent narrative summaries and evaluating them are still unresolved. With regard to evaluation, manual assessment is expensive, subjective and not applicable in real time or to large collections. Moreover, it does not provide re-usable benchmarks. Nevertheless, commonly used metrics for summary evaluation still imply substantial human effort since they require a comparison of candidate summaries with a set of reference summaries. The contributions of this paper are three-fold. First, we provide a comprehensive overview of existing metrics for summary evaluation. We discuss several limitations of existing frameworks for summary evaluation. Second, we introduce an automatic framework for the evaluation of metrics that does not require any human annotation. Finally, we evaluate the existing assessment metrics on a Wikipedia data set and a collection of scientific articles using this framework. Our findings show that the majority of existing metrics based on vocabulary overlap are not suitable for assessment based on comparison with a full text and we discuss this outcome.  相似文献   

3.
Task-based evaluation of text summarization using Relevance Prediction   总被引:2,自引:0,他引:2  
This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual’s performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user—not an independent user—decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current gold-standard based measure used in the summarization evaluation community. Our goal is to provide a stable framework within which developers of new automatic measures may make stronger statistical statements about the effectiveness of their measures in predicting summary usefulness. We demonstrate—as a proof-of-concept methodology for automatic metric developers—that a current automatic evaluation measure has a better correlation with Relevance Prediction than with LDC Agreement and that the significance level for detected differences is higher for the former than for the latter.  相似文献   

4.
Timeline generation systems are a class of algorithms that produce a sequence of time-ordered sentences or text snippets extracted in real-time from high-volume streams of digital documents (e.g. news articles), focusing on retaining relevant and informative content for a particular information need (e.g. topic or event). These systems have a range of uses, such as producing concise overviews of events for end-users (human or artificial agents). To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. sentences) to an explicit representation of what information a ‘good’ summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such evaluations fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which the timeline summarization test collections fail to generalize to new summarization systems, then we propose, evaluate and analyze new automatic solutions to this issue. In particular, using a depooling methodology over 19 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being mis-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of mis-ranking systems, we also propose a range of different automatic ground truth label expansion techniques. Our results show that the proposed expansion techniques can be effective at increasing the robustness of the TREC-TS test collections, as they are able to generate large numbers missing matches with high accuracy, markedly reducing the number of mis-rankings by up to 50%.  相似文献   

5.
科学评价的规范体系研究   总被引:1,自引:0,他引:1       下载免费PDF全文
科学评价规范体系是科学评价管理与监督的基础,科学评价管理与监督需要科学评价机制、政策制度、法律法规和行业规范作为保障,只有形成了科学合理、健康完善的评价规范体系,才能实现对科学评价活动的有效管理和监督。科学评价规范体系可以分为四个层次:第一层次为机制与制度;第二层次为政策与法律法规;第三层次为行业规范;第四层次为科学评价管理与监督。但各国科学评价发展水平不一,科学评价规范程度也各不相同,因而形成各具特色的科学评价规范体系。  相似文献   

6.
创新型企业作为创新的主体,对其创新能力的评价会影响到企业的发展,影响到企业创新能力的提高,因此如何对其进行科学的评价非常重要。首先对创新型企业的内涵进行科学界定,分析影响创新型企业创新能力的因素,对企业创新能力评价体系进行科学分析,采用模糊综合评价法构建创新型企业创新能力评价指标体系.  相似文献   

7.
通过构建高校教师教学能力评估体系,科学评估教师的教学能力,发现并解决教学过程中的问题,最终达到促进教师自我提高的目的,不论是对学生、教师还是对学校的发展都有着十分重要的意义。  相似文献   

8.
通过建立科学、合理的评估指标体系,使用规范的分析评价方法,对政府采购在采购过程中所反映的业绩和效率进行分析,全面反映和评定采购政策功能目标和经济目标的实现程序。分析我国政府采购绩效评估的原则与现状,提出相应的解决对策和建议。  相似文献   

9.
创新型企业评价体系研究   总被引:7,自引:0,他引:7       下载免费PDF全文
创新型企业在增强国家自主创新能力,建设创新型国家中肩负着重要的使命和责任。创新型企业评价体系的研究和完善是创新型企业发展的重要保证。首先在介绍和比较了目前创新型企业评价体系研究的进展和方法的基础上,对创新型企业的内涵进行了深入分析,重点设计了创新型企业的评价体系,采用BP神经网络评价法,选取一级指标,二级指标和三级指标,对指标体系进行分析,并验证了方法的科学性和实用性。通过实证分析,选取汽车行业的部分创新型企业进行调研和实证研究,分析了目前汽车行业创新型企业的发展方向,提出了完善创新型企业评价体系的参考性建议。  相似文献   

10.
创新型城市发展状况评测体系研究   总被引:5,自引:0,他引:5  
朱凌  陈劲  王飞绒 《科学学研究》2008,26(1):215-222
 在研究国内外城市竞争力、创新经济、创新能力相关评价体系经典文献的基础上,构建了衡量创新型城市发展建设状况的评价指标体系,希望为我国当前“创新型国家”的建设提供一套切实可行的方法,以期帮助国内各城市在建设创新型城市的过程中对自身的发展状况进行有效的评估。该指标体系包含创新活动产出效率、创新资源投入水平、创新体系运行状况三个方面共计23个指标,凸显了当前我国创新型城市的建设重点,为我国各省市的创新型可持续发展提供了较理想的借鉴方案,具有较先进的理论发展内涵和较积极的实践指导意义。  相似文献   

11.
针对当前企业经营模拟教学系统缺失对模拟企业业绩进行综合评价的问题,运用平衡计分卡的思想对模拟企业各方面指标进行分析,在此基础上应用层次分析法构建模拟企业业绩评价指标体系及综合评价模型,并根据综合评价模型给出教学效果分析建议,为今后的教学提供指导。  相似文献   

12.
李颖  吴洪波 《科技与管理》2006,8(5):144-146
分析了目前教师评价制度的现状和存在的问题,提出了实行发展性教师评价体系的改革方向,并采用工作行为特征分析的方法构建了发展性教师评价的指标体系。  相似文献   

13.
我国国家科技计划经过三十余年的发展,正面临新的管理改革,加强计划的监测与评估是改革的重要方向。本文以欧盟框架计划评估体系的发展历程为主线,梳理了欧盟框架计划的主要评估活动类型、评估方法与模式、评估体系演变规律等。基于对欧盟框架计划评估体系的回顾和梳理,总结出欧盟框架计划在制度建设、技术方法、评估能力等方面的特点和经验,为我国设计完善国家科技计划的评估体系提出参考建议。  相似文献   

14.
从信息化制造技术的内涵及特征出发,阐述了企业信息化制造技术实施效果评价指标体系的设立原则,通过对典型制造企业的分析,明确了基本评价指标,并通过主成分分析法确定上层指标集,最终确立了完整的指标体系。  相似文献   

15.
本文对企业绿色增长已有研究进行了回顾与分析,界定了企业绿色增长的内涵,从四个方面分析影响企业绿色增长的关键性因素。基于已有绿色增长评价、企业绿色发展及企业绿色增长评价相关研究结果,建立企业绿色增长指标体系与评价方法,收集了2016年80家国有及上市企业的企业社会责任报告,并运用聚类分析与熵值法处理相应数据,研究表明:顾客满意度、净利润增长率、万元综合能耗同比降低百分比与主营业务利润率对企业实现绿色增长有决定性作用;应用这一研究结果对陕西省两家企业的绿色增长现状进行评价与分析,并给出针对性建议。  相似文献   

16.
江新华 《科学学研究》2005,23(5):618-622
当今我国学界学术评审领域的失范行为时有发生,一个重要的原因在于学术评审制度存在着缺陷。本文界定了学术评审制度的概念,分析了学术评审制度主要的缺陷,提出了进行学术评审制度创新的主要举施。  相似文献   

17.
我国学术期刊评价体系现状及发展趋势   总被引:3,自引:2,他引:1  
【目的】梳理我国现存学术期刊评价体系,探讨学术期刊评价体系发展趋势。【方法】通过分析学术期刊评价体系的现状及存在问题,构建科学合理的学术期刊评价体系。【结果】中国的学术期刊评价体系主要有基于“质量评估”的“合格评价”和“优秀期刊评价”以及基于“影响力大小”的核心期刊评价,两种评价体系均有一定的局限性。我国各评价机构应进行整合或合作,将倡导学术规范作为重要职责,建立统一规范的适合中国国情的学术期刊评价体系,不断完善评价指标体系和评价方法,在遴选评价指标时,要更加注重评价指标对期刊质量的引导作用。【结论】只有各评价机构及管理部门深刻反思实际应用中存在的问题,才能真正构建科学合理的学术期刊评价体系。  相似文献   

18.
国家创新环境评价指标体系研究:创新系统视角   总被引:1,自引:0,他引:1  
 本文在梳理国内外有关创新环境研究的基础上,基于创新系统理论从宏观层面深入探讨创新环境的评价指标体系。本文认为创新环境是支撑创新活动的重要条件,不仅包含各个行为主体之间形成的网络关系,也包含制度环境、基础设施、政策法规、教育环境、国际环境等要素条件,可分为软性环境与硬性环境。创新环境对于创新系统运转具有重要作用,既影响创新主体之间的关系,也影响创新过程的绩效。本文提出创新环境评价的基本分析框架,以系统性、合理性、全面性为基本原则,从人才环境、资金环境、市场环境、创业环境、竞争与合作环境5个维度构建国家创新环境评价指标体系,为全面、科学地评价创新环境提供方法参考。  相似文献   

19.
近年来专利权质押贷款业务量迅猛增长,商业银行在此类业务中需要解决的一大棘手问题就是如何筛选出优质专利来作为质押标的。本文以质押专利的质量内涵为切入点,构建包含法律维度、技术维度、企业维度的质量评价体系,并运用支持向量机算法来提升评价效率。同时,引入情景分析法得到质押专利的双情景估值,运用质量评价结果对评估价值进行修正。最后,选取案例企业RX公司的芯片专利作为研究案例,运用评估模型对芯片专利的质押价值进行评估,验证了该模型在质押专利价值评估中的适用性,为专利权质押价值评估提供了新思路。  相似文献   

20.
国家创新环境评价指标体系研究:创新系统视角   总被引:1,自引:0,他引:1  
 本文在梳理国内外有关创新环境研究的基础上,基于创新系统理论从宏观层面深入探讨创新环境的评价指标体系。本文认为创新环境是支撑创新活动的重要条件,不仅包含各个行为主体之间形成的网络关系,也包含制度环境、基础设施、政策法规、教育环境、国际环境等要素条件,可分为软性环境与硬性环境。创新环境对于创新系统运转具有重要作用,既影响创新主体之间的关系,也影响创新过程的绩效。本文提出创新环境评价的基本分析框架,以系统性、合理性、全面性为基本原则,从人才环境、资金环境、市场环境、创业环境、竞争与合作环境5个维度构建国家创新环境评价指标体系,为全面、科学地评价创新环境提供方法参考。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号