首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于序列到序列模型的抽象式中文文本摘要研究
引用本文:余传明,朱星宇,龚雨田,安璐.基于序列到序列模型的抽象式中文文本摘要研究[J].图书情报工作,2019,63(11):108-117.
作者姓名:余传明  朱星宇  龚雨田  安璐
作者单位:1. 中南财经政法大学信息与安全工程学院 武汉 430073; 2. 武汉大学信息管理学院 武汉 430072
基金项目:本文系国家自然科学基金面上项目"大数据环境下基于领域知识获取与对齐的观点检索研究"(项目编号:71373286)和教育部哲学社会科学研究重大课题攻关项目"提高反恐怖主义情报信息工作能力对策研究"(项目编号:17JZD034)研究成果之一。
摘    要:目的/意义]为更好地处理文本摘要任务中的未登录词(out of vocabulary, 00V ),同时避免摘要重复,提高文本摘要的质量,本文以解决00V问题和摘要自我重复问题为研究任务,进行抽象式中文文本摘要研究。方法/过程]在序列到序列(sequence to sequence, seq2seq)模型的基础上增加指向生成机制和覆盖处理机制,通过指向生成将未登录词拷贝到摘要中以解决未登录词问题,通过覆盖处理避免注意力机制(attentionmechanism)反复关注同一位置,以解决重复问题。将本文方法应用到LCSTS中文摘要数据集上进行实验,检验模型效果。结果/结论]实验结果显示,该模型生成摘要的ROUGE ( recall -oriented understudy for gisting evaluation)分数高于传统的seq2seq模型以及抽取式文本摘要模型,表明指向生成和覆盖机制能够有效解决未登录词问题和摘要重复问题,从而显著提升文本摘要质量。

关 键 词:抽象式文本摘要  序列到序列模型  注意力机制  覆盖机制  指向生成机制
收稿时间:2018-06-15

Research of Abstractive Chinese Text Summarization Based on Seq2seq Model
Yu Chuanming,Zhu Xingyu,Gong Yutian,An Lu.Research of Abstractive Chinese Text Summarization Based on Seq2seq Model[J].Library and Information Service,2019,63(11):108-117.
Authors:Yu Chuanming  Zhu Xingyu  Gong Yutian  An Lu
Institution:1. School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073; 2. School of Information Management, Wuhan University, Wuhan 430072
Abstract:Purpose/significance] To deal with the Out Of Vocabulary (OOV) in text summarization while avoiding duplication of summaries, this article focuses on solving the OOV problem and the self-duplication and carries out a profiling study.Method/process] Bases on the sequence-to-sequence model, a pointer generator module and a coverage processing module are added. An attempt is made to copy the OOV into abstractive summary to solve the problem of OOV by means of the pointer generator module. The coverage processing module tries to avoid the Attention Mechanism paying attention to the same position repeatedly to solve the duplicate problem. The model is applied to the Chinese summarization dataset LCSTS to conduct experiments to test the effectiveness.Result/conclusion] Experiment results show that the ROUGE of the generated summary is much higher than that of seq2seq model and extractive model, indicating that in the abstractive Chinese text summary, the pointer generator module and the coverage mechanism module can effectively solve the problem of OOV and the repetition of the summary, thereby significantly improving text summary quality.
Keywords:abstractive text summarization  sequence-to-sequence model  attention mechanism  coverage mechanism  pointer generator mechanism  
本文献已被 维普 等数据库收录!
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号