首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Hierarchical topic modeling with nested hierarchical Dirichlet process
Authors:Yi-qun Ding  Shan-ping Li  Zhen Zhang  Bin Shen
Institution:(1) School of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China;(2) State Street Hangzhou, Hangzhou, 310000, China
Abstract:This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonparametric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as welt as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more free-grained topic relationships compared to the hierarchical latent Dirichlet allocation model.
Keywords:Topic modeling  Natural language processing  Chinese restaurant process  Hierarchical Dirichlet process  Markovchain Monte Carlo  Nonparametric Bayesian statistics
本文献已被 维普 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号