共查询到8条相似文献,搜索用时 4 毫秒
1.
《Information processing & management》2023,60(2):103202
Wikipedia links its articles by manually defined semantic relations called the Wikipedia hyperlink (link) structure. The existing Wikipedia link-based semantic similarity (SS) and semantic relatedness (SR) computation models, such as Wikipedia one-way link (WOLM) model and Wikipedia two-way link (WTLM) model, do not assess the strengths of the relationships between a candidate concept and its links (out-links or in-links). These models treat all the links as equally important even though some links are semantically more influential than others and should be given more importance. This phenomenon reduces the accuracy of these models. This paper presents the Wikipedia bi-linear link (WBLM) model that extends the previously proposed WOLM and WTLM models. The WBLM model explores the Wikipedia link structure as a semantic graph and discovers the strongly (bi-linear links) and weakly (out-links or in-links) connected links of a candidate concept. It improves the link-based vector representations of concepts by assigning weights to their connected links according to the strengths of their semantic associations. The experimental results demonstrate that the proposed WBLM model significantly improves the SS and SR computation accuracy of the WOLM model (6.9%, 8%, 24%, 17.3%, 31.2%, 30.6%, 26.5%, and 35.4%) and WTLM model (1.2%, 3.9%, 7.1%, 9.9%, 11%, 6.3%, 12.7%, and 13%), in terms of linear correlations with human judgments on gold standard benchmarks, including MC30, RG65, WS203, SimLex, 353All, MTurk287, MTurk771, and MEN3000, respectively. Moreover, this research offers a deep insight into the Wikipedia link structure and provides an adequate base for understanding it as a semantic graph. 相似文献
2.
Automatic text summarization attempts to provide an effective solution to today’s unprecedented growth of textual data. This paper proposes an innovative graph-based text summarization framework for generic single and multi document summarization. The summarizer benefits from two well-established text semantic representation techniques; Semantic Role Labelling (SRL) and Explicit Semantic Analysis (ESA) as well as the constantly evolving collective human knowledge in Wikipedia. The SRL is used to achieve sentence semantic parsing whose word tokens are represented as a vector of weighted Wikipedia concepts using ESA method. The essence of the developed framework is to construct a unique concept graph representation underpinned by semantic role-based multi-node (under sentence level) vertices for summarization. We have empirically evaluated the summarization system using the standard publicly available dataset from Document Understanding Conference 2002 (DUC 2002). Experimental results indicate that the proposed summarizer outperforms all state-of-the-art related comparators in the single document summarization based on the ROUGE-1 and ROUGE-2 measures, while also ranking second in the ROUGE-1 and ROUGE-SU4 scores for the multi-document summarization. On the other hand, the testing also demonstrates the scalability of the system, i.e., varying the evaluation data size is shown to have little impact on the summarizer performance, particularly for the single document summarization task. In a nutshell, the findings demonstrate the power of the role-based and vectorial semantic representation when combined with the crowd-sourced knowledge base in Wikipedia. 相似文献
3.
4.
企业间技术相似性是企业技术情报分析的重要内容.为了给企业在全球范围内寻求技术竞争与合作对象提供有效的决策信息支持,提出基于专利耦合的企业间技术相似性分析方法与流程.首先,综合比较了目前理论研究中的相关方法,指出专利耦合分析能较为准确、实时地体现出企业间的技术相似性.然后,在阐释专利耦合分析基本原理的基础上,对企业间专利耦合强度的计算方法进行改进,以便有效区分多对耦合对象之间耦合强度的差异.再将专利耦合分析与相关分析及多维尺度分析相结合,构建了企业间技术相似性可视化分析与应用流程框架.最后,以平板显示技术领域为例论证了基于专利耦合的企业间技术相似性可视化分析流程与应用效果,为企业相关技术情报分析实践提供参考. 相似文献
5.
6.
7.
【目的】 为解决审稿专家信息更新不及时、编辑凭经验送审等因素导致拒审的问题,提出一种基于向量空间模型(Vector Space Model,VSM)和余弦相似度的稿件精准送审方法。【方法】 首先,结合文献调研和《数据分析与知识发现》送审情况分析拒审的关键原因;其次,在中国知网中获取该刊审稿专家(155人)近5年发表的全部论文(1805篇),并使用词频-逆文档频度(Term Frequency-Inverse Document Frequency,TF-IDF)方法计算 相似文献
8.
随着我国发明专利申请数量的迅猛增加,如何通过事前和事后指标测度并识别技术和经济价值高的突破性技术发明就成为学术界面临的焦点问题。针对我国专利普遍缺乏引文信息的现状,本文利用专利的国际专利分类(IPC)信息构建两两专利相似度指标,并引入时间维度对过去、当前以及未来三个时间段的专利相似度比较,测度专利的新颖性、独特性和影响力,从而构建突破性技术发明的综合识别方案。然后,以纳米技术为例,利用美国专利商标局(USPTO)在1975-2015年的授权发明专利数据进行实证检验。结果表明:(1)基于专利IPC四位和六位分类的相似度指标分别可以识别出6.23%和5.06%的纳米技术专利为突破性技术发明;(2)基于专利相似度识别的突破性技术发明与基于专利被引数识别的突破性技术发明具有显著的正相关关系,但是,两类识别方法得到的结果中仅有不足总样本的0.5%是相同的,表明以往单纯依赖专利被引数据识别突破性技术发明可能存在一定偏差;(3)对突破性技术发明来源特征的实证检验表明,基于专利相似度和基于专利被引数的突破性技术发明的发明人和组织来源特征基本一致,而发明层面的知识来源特征呈现不一致的结果,进一步反映出两类识别方案的差异性。本文基于专利相似度构建的突破性技术发明识别方案既为企业在实践中挖掘和利用高价值的发明专利提供参考,也对未来突破性技术发明相关研究达成一致结论具有重要意义。 相似文献