Text summarization using a trainable summarizer and latent semantic analysis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Text summarization using a trainable summarizer and latent semantic analysis

Institution:	1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;2. CETC Big Data Research Institute Co., Ltd., Guiyang 550022, China;1. Department of Computer and Communications Technologies, University of Extremadura, Campus Universitario s/n, Caceres 10003, Spain;2. Department of Mathematics, University of Extremadura, Campus Universitario s/n, Caceres 10003, Spain

Abstract:	This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏