Stemming to improve translation lexicon creation form bitexts |
| |
Authors: | Mohamed Abdel Fattah Fuji Ren Shingo Kuroiwa |
| |
Institution: | Faculty of Engineering, University of Tokushima, 2-1 Minamijosanjima, Tokushima 770-8506, Japan |
| |
Abstract: | Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair’s extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents. |
| |
Keywords: | Multilingual dictionaries English/Arabic translation Multilingual thesaurus Stemming |
本文献已被 ScienceDirect 等数据库收录! |
|