ITA
ENG

A machine learning approach to POS tagging

Authors

Marquez, L Padro, L Rodriguez, H

Citation

L. Marquez et al., A machine learning approach to POS tagging, MACH LEARN, 39(1), 2000, pp. 59-91

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

MACHINE LEARNING

ISSN journal

08856125 → ACNP

Volume

Issue

Year of publication

2000

Pages

59 - 91

Database

ISI

SICI code

0885-6125(200004)39:1<59:AMLATP>2.0.ZU;2-Z

Abstract

We have applied the inductive learning of statistical decision trees and re laxation labeling to the Natural Language Processing (NLP) task of morphosy ntactic disambiguation (Part Of Speech Tagging). The learning process is su pervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution o f tags and words in some relevant contexts. The acquired decision trees hav e been directly used in a tagger that is both relatively simple and fast, a nd which has been tested and evaluated on the Wall Street Journal (WSJ) cor pus with competitive accuracy. However, better results can be obtained by t ranslating the trees into rules to feed a flexible relaxation labeling base d tagger. In this direction we describe a tagger which is able to use infor mation of any kind (n-grams, automatically acquired constraints, linguistic ally motivated manually written constraints, etc.), and in particular to in corporate the machine-learned decision trees. Simultaneously, we address th e problem of tagging when only limited training material is available, whic h is crucial in any process of constructing, from scratch, an annotated cor pus. We show that high levels of accuracy can be achieved with our system i n this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.