n-Gram-based indexing for Korean text retrieval

Citation
Jh. Lee et al., n-Gram-based indexing for Korean text retrieval, INF PR MAN, 35(4), 1999, pp. 427-441
Citations number
17
Categorie Soggetti
Library & Information Science","Information Tecnology & Communication Systems
Journal title
INFORMATION PROCESSING & MANAGEMENT
ISSN journal
03064573 → ACNP
Volume
35
Issue
4
Year of publication
1999
Pages
427 - 441
Database
ISI
SICI code
0306-4573(199907)35:4<427:NIFKTR>2.0.ZU;2-M
Abstract
Two groups of indexing methods and morpheme-based indexing have been invest igated in the literature of Korean text retrieval. The word-based indexing eliminates the suffix of a word, and generates its remaining stem as an ind ex term. The index term is often a compound noun, which results in the seri ous decrease of retrieval effectiveness. The morpheme-based indexing overco mes the problem of compound nouns by decomposing a compound noun into simpl e nouns. It, however, requires a large dictionary and complex linguistic kn owledge. In this paper we propose a new indexing method based on n-gram-bas ed indexing is considerably faster than the morpheme-based indexing, and al so provides better retrieval effectiveness. (C) 1999 Elsevier Science Ltd. All rights reserved.