ITA
ENG

Machine learning for information extraction in informal domains

Authors

Freitag, D

Citation

D. Freitag, Machine learning for information extraction in informal domains, MACH LEARN, 39(2-3), 2000, pp. 169-202

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

MACHINE LEARNING

ISSN journal

08856125 → ACNP

Volume

Issue

2-3

Year of publication

2000

Pages

169 - 202

Database

ISI

SICI code

0885-6125(200005)39:2-3<169:MLFIEI>2.0.ZU;2-Y

Abstract

We consider the problem of learning to perform information extraction in do mains where linguistic processing is problematic, such as Usenet posts, ema il, and finger plan files. In place of syntactic and semantic information, other sources of information can be used, such as term frequency, typograph y, formatting, and mark-up. We describe four learning approaches to this pr oblem, each drawn from a different paradigm: a rote learner, a term-space l earner based on Naive Bayes, an approach using grammatical induction, and a relational rule learner. Experiments on 14 information extraction problems defined over four diverse document collections demonstrate the effectivene ss of these approaches. Finally, we describe a multistrategy approach which combines these learners and yields performance competitive with or better than the best of them. This technique is modular and flexible, and could fi nd application in other machine learning problems.