首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Revisiting Cross-document Structure Theory for multi-document discourse parsing
Authors:Erick Galani Maziero  Maria Lucía del Rosário Castro Jorge  Thiago Alexandre Salgueiro Pardo
Institution:Interinstitutional Center for Computational Linguistics (NILC), Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo (USP), Avenida Trabalhador São-carlense, 400, 13566-590 São Carlos, SP, Brazil
Abstract:Multi-document discourse parsing aims to automatically identify the relations among textual spans from different texts on the same topic. Recently, with the growing amount of information and the emergence of new technologies that deal with many sources of information, more precise and efficient parsing techniques are required. The most relevant theory to multi-document relationship, Cross-document Structure Theory (CST), has been used for parsing purposes before, though the results had not been satisfactory. CST has received many critics because of its subjectivity, which may lead to low annotation agreement and, consequently, to poor parsing performance. In this work, we propose a refinement of the original CST, which consists in (i) formalizing the relationship definitions, (ii) pruning and combining some relations based on their meaning, and (iii) organizing the relations in a hierarchical structure. The hypothesis for this refinement is that it will lead to better agreement in the annotation and consequently to better parsing results. For this aim, it was built an annotated corpus according to this refinement and it was observed an improvement in the annotation agreement. Based on this corpus, a parser was developed using machine learning techniques and hand-crafted rules. Specifically, hierarchical techniques were used to capture the hierarchical organization of the relations according to the proposed refinement of CST. These two approaches were used to identify the relations among texts spans and to generate multi-document annotation structure. Results outperformed other CST parsers, showing the adequacy of the proposed refinement in the theory.
Keywords:Discourse parsing  Multi-document processing  Cross-document Structure Theory  Machine learning
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号