首页 | 本学科首页   官方微博 | 高级检索  
 共查询到8条相似文献,搜索用时 4 毫秒
Wikipedia links its articles by manually defined semantic relations called the Wikipedia hyperlink (link) structure. The existing Wikipedia link-based semantic similarity (SS) and semantic relatedness (SR) computation models, such as Wikipedia one-way link (WOLM) model and Wikipedia two-way link (WTLM) model, do not assess the strengths of the relationships between a candidate concept and its links (out-links or in-links). These models treat all the links as equally important even though some links are semantically more influential than others and should be given more importance. This phenomenon reduces the accuracy of these models. This paper presents the Wikipedia bi-linear link (WBLM) model that extends the previously proposed WOLM and WTLM models. The WBLM model explores the Wikipedia link structure as a semantic graph and discovers the strongly (bi-linear links) and weakly (out-links or in-links) connected links of a candidate concept. It improves the link-based vector representations of concepts by assigning weights to their connected links according to the strengths of their semantic associations. The experimental results demonstrate that the proposed WBLM model significantly improves the SS and SR computation accuracy of the WOLM model (6.9%, 8%, 24%, 17.3%, 31.2%, 30.6%, 26.5%, and 35.4%) and WTLM model (1.2%, 3.9%, 7.1%, 9.9%, 11%, 6.3%, 12.7%, and 13%), in terms of linear correlations with human judgments on gold standard benchmarks, including MC30, RG65, WS203, SimLex, 353All, MTurk287, MTurk771, and MEN3000, respectively. Moreover, this research offers a deep insight into the Wikipedia link structure and provides an adequate base for understanding it as a semantic graph.  相似文献   

Automatic text summarization attempts to provide an effective solution to today’s unprecedented growth of textual data. This paper proposes an innovative graph-based text summarization framework for generic single and multi document summarization. The summarizer benefits from two well-established text semantic representation techniques; Semantic Role Labelling (SRL) and Explicit Semantic Analysis (ESA) as well as the constantly evolving collective human knowledge in Wikipedia. The SRL is used to achieve sentence semantic parsing whose word tokens are represented as a vector of weighted Wikipedia concepts using ESA method. The essence of the developed framework is to construct a unique concept graph representation underpinned by semantic role-based multi-node (under sentence level) vertices for summarization. We have empirically evaluated the summarization system using the standard publicly available dataset from Document Understanding Conference 2002 (DUC 2002). Experimental results indicate that the proposed summarizer outperforms all state-of-the-art related comparators in the single document summarization based on the ROUGE-1 and ROUGE-2 measures, while also ranking second in the ROUGE-1 and ROUGE-SU4 scores for the multi-document summarization. On the other hand, the testing also demonstrates the scalability of the system, i.e., varying the evaluation data size is shown to have little impact on the summarizer performance, particularly for the single document summarization task. In a nutshell, the findings demonstrate the power of the role-based and vectorial semantic representation when combined with the crowd-sourced knowledge base in Wikipedia.  相似文献   

企业间技术相似性是企业技术情报分析的重要内容.为了给企业在全球范围内寻求技术竞争与合作对象提供有效的决策信息支持,提出基于专利耦合的企业间技术相似性分析方法与流程.首先,综合比较了目前理论研究中的相关方法,指出专利耦合分析能较为准确、实时地体现出企业间的技术相似性.然后,在阐释专利耦合分析基本原理的基础上,对企业间专利耦合强度的计算方法进行改进,以便有效区分多对耦合对象之间耦合强度的差异.再将专利耦合分析与相关分析及多维尺度分析相结合,构建了企业间技术相似性可视化分析与应用流程框架.最后,以平板显示技术领域为例论证了基于专利耦合的企业间技术相似性可视化分析流程与应用效果,为企业相关技术情报分析实践提供参考.  相似文献   

新兴研究主题识别可为研究者提供选题方向,把握技术未来前景。传统基于关键词的主题识别,不能准确反映主题词之间的逻辑关系,因而对研究主题的揭示需要依据专家的判断。本文提出的基于突现文献和SAO相似度的新兴研究主题识别,在确定了具有新兴特征的文献后,通过对文献摘要的语义关联分析,揭示了文献研究内容的相似性,从而更准确地提炼出研究主题。文章最后以精密单点定位技术为例对所提出方法进行了实证分析。  相似文献   

在回顾战略管理文献中关于一致性的定义和测度方法研究的前提下,提出了以基于《知网》的词义相似度计算为基础,事实主题与组织战略之间的一致性测度方法。该方法在计算了事实主题特征词和组织战略特征词分别与基本因素类特征词的词义相似度之后,采用夹角余弦法测度事实主题与组织战略之间的一致性。在实证分析中,正反两方面的实例说明和验证了本研究方法的有效性和可行性。  相似文献   

【目的】 为解决审稿专家信息更新不及时、编辑凭经验送审等因素导致拒审的问题,提出一种基于向量空间模型(Vector Space Model,VSM)和余弦相似度的稿件精准送审方法。【方法】 首先,结合文献调研和《数据分析与知识发现》送审情况分析拒审的关键原因;其次,在中国知网中获取该刊审稿专家(155人)近5年发表的全部论文(1805篇),并使用词频-逆文档频度(Term Frequency-Inverse Document Frequency,TF-IDF)方法计算  相似文献   

马荣康  王艺棠 《科研管理》2021,42(5):153-160
随着我国发明专利申请数量的迅猛增加,如何通过事前和事后指标测度并识别技术和经济价值高的突破性技术发明就成为学术界面临的焦点问题。针对我国专利普遍缺乏引文信息的现状,本文利用专利的国际专利分类(IPC)信息构建两两专利相似度指标,并引入时间维度对过去、当前以及未来三个时间段的专利相似度比较,测度专利的新颖性、独特性和影响力,从而构建突破性技术发明的综合识别方案。然后,以纳米技术为例,利用美国专利商标局(USPTO)在1975-2015年的授权发明专利数据进行实证检验。结果表明:(1)基于专利IPC四位和六位分类的相似度指标分别可以识别出6.23%和5.06%的纳米技术专利为突破性技术发明;(2)基于专利相似度识别的突破性技术发明与基于专利被引数识别的突破性技术发明具有显著的正相关关系,但是,两类识别方法得到的结果中仅有不足总样本的0.5%是相同的,表明以往单纯依赖专利被引数据识别突破性技术发明可能存在一定偏差;(3)对突破性技术发明来源特征的实证检验表明,基于专利相似度和基于专利被引数的突破性技术发明的发明人和组织来源特征基本一致,而发明层面的知识来源特征呈现不一致的结果,进一步反映出两类识别方案的差异性。本文基于专利相似度构建的突破性技术发明识别方案既为企业在实践中挖掘和利用高价值的发明专利提供参考,也对未来突破性技术发明相关研究达成一致结论具有重要意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号