首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Automatic acquisition of inflectional lexica for morphological normalisation
Authors:J Šnajder  B Dalbelo Baši?  M Tadi?
Institution:1. Department of Electronics, Microelectronics, Computer and Intelligent Systems, Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia;2. Department of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb, Ivana Lu?i?a 3, Zagreb, Croatia
Abstract:Due to natural language morphology, words can take on various morphological forms. Morphological normalisation – often used in information retrieval and text mining systems – conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.
Keywords:Morphological normalisation  Morphological lexicon  Lexicon acquisition  Inflection  Croatian language  Text mining  Information retrieval
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号