首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征编码和图嵌入的姓名消歧方法
引用本文:马莹莹,吴幼龙,唐华. 基于特征编码和图嵌入的姓名消歧方法[J]. 中国科学院大学学报, 2022, 39(3): 360-368. DOI: 10.7523/j.ucas.2020.0019
作者姓名:马莹莹  吴幼龙  唐华
作者单位:1. 上海科技大学信息科学与技术学院, 上海 201210;2. 中国科学院上海微系统与信息技术研究所, 上海 200050;3. 中国科学院大学, 北京 100049
基金项目:国家自然科学基金(61901267)资助
摘    要:针对作者姓名歧义问题,提出基于特征编码和图嵌入的作者姓名消歧方法。该方法首先利用word2vec模型对文档的属性特征进行编码从而构建文档的表征向量,然后采用图自动编码器将文档关系编码至文档向量中,聚类相似文档。为进一步提升聚类结果的准确性,使用图嵌入的方法将文档关系网络和作者关系网络的拓扑结构信息引入文档向量,进一步聚集相关文档。该方法同时利用文档的属性特征以及多个关系网络的信息,通过无监督学习的方法寻找文档表征向量,实现良好的姓名消歧效果。在真实作者数据集AMiner上的测试结果表明,该方法显著优于目前几个其他基于图网络的方法。

关 键 词:姓名消歧  图神经网络  聚类方法  特征提取  图嵌入  
收稿时间:2020-02-17
修稿时间:2020-04-03

Name disambiguation based on encoding attributes and graph topology
MA Yingying,WU Youlong,TANG Hua. Name disambiguation based on encoding attributes and graph topology[J]. , 2022, 39(3): 360-368. DOI: 10.7523/j.ucas.2020.0019
Authors:MA Yingying  WU Youlong  TANG Hua
Affiliation:1.School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China;2.Shanghai Institute of Microsystem & Information Technology, Chinese Academy of Sciences, Shanghai 200050, China;3.University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Aiming at solving the problem of author name ambiguity, we propose a novel name disambiguation method based on encoding attributes and graph topology. A word2vec model is used to construct document representation vectors by encoding the attributes of documents. The relationship of documents is then encoded into the document embedding vectors by a graph auto-encoder and similar documents are aggregated. To further improve the accuracy of the clustering results, a graph embedding model is proposed to introduce the document-document network and author-author network topology into the document vectors afterword, thus related papers are moved closer. This method utilizes the information of document attributes and relationship networks at the same time, finds document representation vectors using an unsupervised model and improves the performance of name disambiguation. Experimental results on the real author dataset AMiner show that our method is superior to several state-of-the-art graph-based solutions.
Keywords:name disambiguation  graph neural network  clustering method  feature extraction  graph embedding  
点击此处可从《中国科学院大学学报》浏览原始摘要信息
点击此处可从《中国科学院大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号