ITA
ENG

Effective foreign word extraction for Korean information retrieval

Authors

Kang, BJ Choi, KS

Citation

Bj. Kang et Ks. Choi, Effective foreign word extraction for Korean information retrieval, INF PR MAN, 38(1), 2002, pp. 91-109

Citations number

Categorie Soggetti

Library & Information Science","Information Tecnology & Communication Systems

Journal title

INFORMATION PROCESSING & MANAGEMENT

ISSN journal

03064573 → ACNP

Volume

Issue

Year of publication

2002

Pages

91 - 109

Database

ISI

SICI code

0306-4573(200201)38:1<91:EFWEFK>2.0.ZU;2-3

Abstract

In Korean text, foreign words, which are mostly transliterations of English words, are frequently used. Foreign words are usually very important index terms in Korean information retrieval since most of them are technical ter ms or names. So accurate foreign word extraction is crucial for high perfor mance of information retrieval. However, accurate foreign word extraction i s not easy because it inevitably accompanies word segmentation and most of the foreign words are unknown. In this paper, we present an effective forei gn word recognition and extraction method. In order to accurately extract f oreign words, we developed an effective method of word segmentation that in volves unknown foreign words. Our word segmentation method effectively util izes both unknown word information acquired through the automatic dictionar y compilation and foreign word recognition information. Our HMM-based forei gn word recognition method does not require large labeled examples for the model training unlike the previously proposed method. (C) 2001 Elsevier Sci ence Ltd. All rights reserved.