首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Doc2vec的专利与行业类目映射研究
引用本文:马晓萌,徐峰,刘清民,封颖.基于Doc2vec的专利与行业类目映射研究[J].情报探索,2020(6):67-74.
作者姓名:马晓萌  徐峰  刘清民  封颖
作者单位:中国科学技术信息研究所战略研究中心 北京 100038;中国科学技术信息研究所战略研究中心 北京 100038;中国科学技术信息研究所战略研究中心 北京 100038;中国科学技术信息研究所战略研究中心 北京 100038
摘    要:目的/意义]使用深度学习中Doc2vec文本向量化的方法进行专利与行业间类目相似度的计算,旨在为用计算机进行类目映射时提供新的方法和思路。方法/过程]实验通过《国际专利分类表》的小类及其下级类目大组与《国民经济行业分类表》中的小类展开,通过Doc2vec文本向量化和余弦相似度的方法求取三组相似值(专利小类与行业小类、专利大组与行业小类、每组专利小类下大组与行业小类相似度的平均值),并以农业类目为例进行解释说明。结果/结论]通过计算专利大组与行业小类相似度平均值的方法进行映射更具合理性。

关 键 词:Word2vec  Doc2vec  类目映射  余弦相似度

Doc2vec-based Study on Mapping Between Patented and Industrial Categories
Ma Xiaomeng,Xu Feng,Liu Qingmin,Feng Ying.Doc2vec-based Study on Mapping Between Patented and Industrial Categories[J].Information Research,2020(6):67-74.
Authors:Ma Xiaomeng  Xu Feng  Liu Qingmin  Feng Ying
Institution:(Center of Strategy Institute of Scientific and Technical Information of China,Beijing 10038)
Abstract:Purpose/significance]The paper uses Doc2vec which belongs to deep learning to calculate similarities between patented and industrial categories so as to provide a new method for category mapping while using computers.Method/process]The experiment is carried out with the subclasses as well as their main groups of International Patent Classification and the subclasses of National Industries Classification.And with the methods of Doc2vec text vectorization and cosine similarity it calculates three sets of similarities one set is the average similarity between patented and industrial subclasses another one is the average similarity between patented main groups and industrial subclasses and the third is the average similarity between patented subclass’main groups and industrial subclasses.Result/conclusion]It was proved that the third was the most efficient and reasonable one to category mapping.
Keywords:text vectorization  Word2vec  Doc2vec  category mapping  cosine similarity
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号