Effective foreign word extraction for Korean information retrieval

Authors
Citation
Bj. Kang et Ks. Choi, Effective foreign word extraction for Korean information retrieval, INF PR MAN, 38(1), 2002, pp. 91-109
Citations number
16
Categorie Soggetti
Library & Information Science","Information Tecnology & Communication Systems
Journal title
INFORMATION PROCESSING & MANAGEMENT
ISSN journal
03064573 → ACNP
Volume
38
Issue
1
Year of publication
2002
Pages
91 - 109
Database
ISI
SICI code
0306-4573(200201)38:1<91:EFWEFK>2.0.ZU;2-3
Abstract
In Korean text, foreign words, which are mostly transliterations of English words, are frequently used. Foreign words are usually very important index terms in Korean information retrieval since most of them are technical ter ms or names. So accurate foreign word extraction is crucial for high perfor mance of information retrieval. However, accurate foreign word extraction i s not easy because it inevitably accompanies word segmentation and most of the foreign words are unknown. In this paper, we present an effective forei gn word recognition and extraction method. In order to accurately extract f oreign words, we developed an effective method of word segmentation that in volves unknown foreign words. Our word segmentation method effectively util izes both unknown word information acquired through the automatic dictionar y compilation and foreign word recognition information. Our HMM-based forei gn word recognition method does not require large labeled examples for the model training unlike the previously proposed method. (C) 2001 Elsevier Sci ence Ltd. All rights reserved.