ITA
ENG

Applying machine learning for high-performance named-entity extraction

Authors

Baluja, S Mittal, VO Sukthankar, R

Citation

S. Baluja et al., Applying machine learning for high-performance named-entity extraction, COMPUT INTE, 16(4), 2000, pp. 586-595

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

COMPUTATIONAL INTELLIGENCE

ISSN journal

08247935 → ACNP

Volume

Issue

Year of publication

2000

Pages

586 - 595

Database

ISI

SICI code

0824-7935(200011)16:4<586:AMLFHN>2.0.ZU;2-Y

Abstract

This paper describes a machine learning approach to building an efficient a nd accurate name spotting system. Finding names in free text is an importan t task in many text-based applications. Most previous approaches were based on hand-crafted modules encoding language and genre-specific knowledge. Th ese approaches had at least two shortcomings: they required large amounts o f time and expertise to develop and were not easily portable to new languag es and genres. This paper describes an extensible system that automatically combines weak evidence from different, easily available sources: parts-of- speech tags, dictionaries, and surface-level syntactic information such as capitalization and punctuation. Individually, each piece of evidence is ins ufficient for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand-crafted approaches.