A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Authors:	Jee-Hyub Kim Byung-Kwan Kwak Seungwoo Lee Geunbae Lee Jong-Hyeok Lee

Institution:	(1) Biological Research Information Center (BRIC), Pohang, South Korea;(2) Electrical and Computer Engineering Division, Pohang University of Science & Technology (POSTECH), Pohang, South Korea

Abstract:	In Korean information retrieval, compound nouns play an important role in improving precision in search experiments. There are two major approaches to compound noun indexing in Korean: statistical and linguistic. Each method, however, has its own shortcomings, such as limitations when indexing diverse types of compound nouns, over-generation of compound nouns, and data sparseness in training. In this paper, we propose a corpus-based learning method, which can index diverse types of compound nouns using rules automatically extracted from a large corpus. The automatic learning method is more portable and requires less human effort, although it exhibits a performance level similar to the manual-linguistic approach. We also present a new filtering method to solve the problems of compound noun over-generation and data sparseness.

Keywords:	corpus-based learning compound noun indexing filtering information retrieval search performance evaluation
本文献已被 SpringerLink 等数据库收录！