In Korean text, foreign words, which are mostly transliterations of English
words, are frequently used. Foreign words are usually very important index
terms in Korean information retrieval since most of them are technical ter
ms or names. So accurate foreign word extraction is crucial for high perfor
mance of information retrieval. However, accurate foreign word extraction i
s not easy because it inevitably accompanies word segmentation and most of
the foreign words are unknown. In this paper, we present an effective forei
gn word recognition and extraction method. In order to accurately extract f
oreign words, we developed an effective method of word segmentation that in
volves unknown foreign words. Our word segmentation method effectively util
izes both unknown word information acquired through the automatic dictionar
y compilation and foreign word recognition information. Our HMM-based forei
gn word recognition method does not require large labeled examples for the
model training unlike the previously proposed method. (C) 2001 Elsevier Sci
ence Ltd. All rights reserved.