A machine learning approach to POS tagging

Citation
L. Marquez et al., A machine learning approach to POS tagging, MACH LEARN, 39(1), 2000, pp. 59-91
Citations number
71
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
39
Issue
1
Year of publication
2000
Pages
59 - 91
Database
ISI
SICI code
0885-6125(200004)39:1<59:AMLATP>2.0.ZU;2-Z
Abstract
We have applied the inductive learning of statistical decision trees and re laxation labeling to the Natural Language Processing (NLP) task of morphosy ntactic disambiguation (Part Of Speech Tagging). The learning process is su pervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution o f tags and words in some relevant contexts. The acquired decision trees hav e been directly used in a tagger that is both relatively simple and fast, a nd which has been tested and evaluated on the Wall Street Journal (WSJ) cor pus with competitive accuracy. However, better results can be obtained by t ranslating the trees into rules to feed a flexible relaxation labeling base d tagger. In this direction we describe a tagger which is able to use infor mation of any kind (n-grams, automatically acquired constraints, linguistic ally motivated manually written constraints, etc.), and in particular to in corporate the machine-learned decision trees. Simultaneously, we address th e problem of tagging when only limited training material is available, whic h is crucial in any process of constructing, from scratch, an annotated cor pus. We show that high levels of accuracy can be achieved with our system i n this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.