Applying machine learning for high-performance named-entity extraction

Citation
S. Baluja et al., Applying machine learning for high-performance named-entity extraction, COMPUT INTE, 16(4), 2000, pp. 586-595
Citations number
20
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
COMPUTATIONAL INTELLIGENCE
ISSN journal
08247935 → ACNP
Volume
16
Issue
4
Year of publication
2000
Pages
586 - 595
Database
ISI
SICI code
0824-7935(200011)16:4<586:AMLFHN>2.0.ZU;2-Y
Abstract
This paper describes a machine learning approach to building an efficient a nd accurate name spotting system. Finding names in free text is an importan t task in many text-based applications. Most previous approaches were based on hand-crafted modules encoding language and genre-specific knowledge. Th ese approaches had at least two shortcomings: they required large amounts o f time and expertise to develop and were not easily portable to new languag es and genres. This paper describes an extensible system that automatically combines weak evidence from different, easily available sources: parts-of- speech tags, dictionaries, and surface-level syntactic information such as capitalization and punctuation. Individually, each piece of evidence is ins ufficient for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand-crafted approaches.