首页 | 本学科首页   官方微博 | 高级检索  
     检索      


The bootstrapping of the Yarowsky algorithm in real corpora
Authors:Ricardo Sánchez-de-Madariaga  José R Fernández-del-Castillo
Institution:Departamento de Ciencias de la Computación, Escuela Técnica Superior de Ingenier?´a Informática, Politécnico, Campus Universitario, Universidad de Alcalá, 28871 Alcalá de Henares, Madrid, Spain
Abstract:The Yarowsky bootstrapping algorithm resolves the homograph-level word sense disambiguation (WSD) problem, which is the sense granularity level required for real natural language processing (NLP) applications. At the same time it resolves the knowledge acquisition bottleneck problem affecting most WSD algorithms and can be easily applied to foreign language corpora. However, this paper shows that the Yarowsky algorithm is significantly less accurate when applied to domain fluctuating, real corpora. This paper also introduces a new bootstrapping methodology that performs much better when applied to these corpora. The accuracy achieved in non-domain fluctuating corpora is not reached due to inherent domain fluctuation ambiguities.
Keywords:Word sense disambiguation  Polysemy  Homograph  Knowledge acquisition bottleneck  Domain fluctuating corpora  Bootstrapping  Semi-supervised learning
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号