A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean |
| |
Authors: | Jee-Hyub Kim Byung-Kwan Kwak Seungwoo Lee Geunbae Lee Jong-Hyeok Lee |
| |
Institution: | (1) Biological Research Information Center (BRIC), Pohang, South Korea;(2) Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea |
| |
Abstract: | In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness. |
| |
Keywords: | corpus-based learning compound noun indexing filtering information retrieval search performance evaluation |
本文献已被 SpringerLink 等数据库收录! |
|