Topic analysis using a finite mixture model |
| |
Affiliation: | 1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, PR China;2. Jiangsu Key Laboratory of Image and Video Understanding for Social Safety, Nanjing University of Science and Technology, Nanjing, PR China;3. School of Engineering and Applied Science, Columbia University, USA |
| |
Abstract: | Addressed here is the issue of ‘topic analysis’ which is used to determine a text’s topic structure, a representation indicating what topics are included in a text and how those topics change within the text. Topic analysis consists of two main tasks: topic identification and text segmentation. While topic analysis would be extremely useful in a variety of text processing applications, no previous study has so far sufficiently addressed it. A statistical learning approach to the issue is proposed in this paper. More specifically, topics here are represented by means of word clusters, and a finite mixture model, referred to as a stochastic topic model (STM), is employed to represent a word distribution within a text. In topic analysis, a given text is segmented by detecting significant differences between STMs, and topics are identified by means of estimation of STMs. Experimental results indicate that the proposed method significantly outperforms methods that combine existing techniques. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|