ITA
ENG

n-Gram-based indexing for Korean text retrieval

Authors

Lee, JH Cho, HY Park, HR

Citation

Jh. Lee et al., n-Gram-based indexing for Korean text retrieval, INF PR MAN, 35(4), 1999, pp. 427-441

Citations number

Categorie Soggetti

Library & Information Science","Information Tecnology & Communication Systems

Journal title

INFORMATION PROCESSING & MANAGEMENT

ISSN journal

03064573 → ACNP

Volume

Issue

Year of publication

1999

Pages

427 - 441

Database

ISI

SICI code

0306-4573(199907)35:4<427:NIFKTR>2.0.ZU;2-M

Abstract

Two groups of indexing methods and morpheme-based indexing have been invest igated in the literature of Korean text retrieval. The word-based indexing eliminates the suffix of a word, and generates its remaining stem as an ind ex term. The index term is often a compound noun, which results in the seri ous decrease of retrieval effectiveness. The morpheme-based indexing overco mes the problem of compound nouns by decomposing a compound noun into simpl e nouns. It, however, requires a large dictionary and complex linguistic kn owledge. In this paper we propose a new indexing method based on n-gram-bas ed indexing is considerably faster than the morpheme-based indexing, and al so provides better retrieval effectiveness. (C) 1999 Elsevier Science Ltd. All rights reserved.