Unsupervised learning of word segmentation rules with genetic algorithms and inductive logic programming

Citation
D. Kazakov et S. Manandhar, Unsupervised learning of word segmentation rules with genetic algorithms and inductive logic programming, MACH LEARN, 43(1-2), 2001, pp. 121-162
Citations number
46
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
43
Issue
1-2
Year of publication
2001
Pages
121 - 162
Database
ISI
SICI code
0885-6125(200104/05)43:1-2<121:ULOWSR>2.0.ZU;2-4
Abstract
This article presents a combination of unsupervised and supervised learning techniques for the generation of word segmentation rules from a raw list o f words. First, a language bias for word se mentation is introduced and a s imple genetic algorithm is used in the search for a segmentation that corre sponds to the best bias value. In the second phase, the words segmented by the genetic algorithm are used as an input for the first order decision lis t learner CLOG. The result is a set of first order rules which can be used for segmentation of unseen words. When applied on either the training data or unseen data, these rules produce segmentations which are linguistically meaningful, and to a large degree conforming to the annotation provided.