首页 | 本学科首页   官方微博 | 高级检索  
     检索      

专业社交媒体中的主题图谱构建方法研究——以汽车论坛为例
引用本文:林杰,苗润生.专业社交媒体中的主题图谱构建方法研究——以汽车论坛为例[J].情报学报,2020,39(1):68-80.
作者姓名:林杰  苗润生
作者单位:同济大学经济与管理学院,上海 200092;同济大学经济与管理学院,上海 200092
基金项目:国家自然科学基金面上项目“社交媒体中用户创新价值度测量模型及互动创新管理方法研究”(71672128);同济大学基本科研业务费专项资金项目“基于大数据的社交网络传播机理与模型研究”(1200219368)
摘    要:专业社交媒体中主题图谱的内容包括论坛中的主题及主题之间的关系,其具有挖掘专业产品创新方向、构建专业知识索引等重要应用价值。本文基于深度学习技术与文本挖掘技术,提出了专业社交媒体中的主题图谱构建方法。首先,使用专业社交媒体中的文本训练Skip-Gram模型,利用该模型的隐藏层权重与模型输出的预测结果,分别获取词语间的语义相似度与上下文关联度。其次,基于该语义相似度与上下文关联度,对已有领域种子本体词汇进行扩充,将语义相似或上下文相邻近的词汇纳入本体词汇,为主题抽取提供高质量的领域词汇。然后,基于扩充的专业本体词汇,使用结合本体词汇的LDA主题模型从专业社交媒体文本中抽取主题与主题词。最后,利用语义相似度与上下文关联度,定义关联度权重,通过图模型与谱聚类,获取主题间与主题词的关联关系与层次结构。本文使用汽车论坛语料进行主题图谱生成实验。实验结果表明,本文方法获取的主题词纯净度相比单独使用LDA模型提升了20.2%,且能够清晰合理地展现主题之间的关系。

关 键 词:专业社交媒体  主题图谱  Skip-Gram模型  LDA主题模型  图模型

A Method for Constructing Topic Map in Professional Social Media:A Case Study of Automobile Forum
Lin Jie,Miao Runsheng.A Method for Constructing Topic Map in Professional Social Media:A Case Study of Automobile Forum[J].Journal of the China Society for Scientific andTechnical Information,2020,39(1):68-80.
Authors:Lin Jie  Miao Runsheng
Institution:(School of Economics and Management,Tongji University,Shanghai 200092)
Abstract:The content of topic maps in professional social media includes topics in forums and the relationship among these topics. Topic maps are important in different applications such as determining the direction of professional product innovation and building professional knowledge indices. Based on deep learning and text mining technology, this paper proposes a method for constructing topic maps in professional social media. First, the Skip-Gram model is trained using professional social media text. The hidden layer weight of the model is regarded as the semantic similarities of words, while prediction in the model is regarded as the contextual relevance between words. Second, the existing seed ontology vocabulary is expanded based on semantic similarity and contextual relevance;the aim of this step is to provide high-quality domain vocabulary for the next step of topic extraction. Topics are then extracted by the ontology-based latent Dirichlet allocation(LDA) topic model. Finally, the weight of relevancy between words is defined using semantic similarity and context relevance. The relevancy and hierarchical structure between topics or sub-topics are obtained through graph modeling and spectral clustering. In this paper, the car forum corpus is used to develop a topic map. The results show that the proposed method improved keyword purity by 20.2% compared to the LDA model, and could display the relationships among these topics both clearly and reasonably.
Keywords:professional social media  topic map  Skip-Gram  LDA topic model  graph model
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号