Learning to construct knowledge bases from the World Wide Web

Citation
M. Craven et al., Learning to construct knowledge bases from the World Wide Web, ARTIF INTEL, 118(1-2), 2000, pp. 69-113
Citations number
67
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
ARTIFICIAL INTELLIGENCE
ISSN journal
00043702 → ACNP
Volume
118
Issue
1-2
Year of publication
2000
Pages
69 - 113
Database
ISI
SICI code
0004-3702(200004)118:1-2<69:LTCKBF>2.0.ZU;2-E
Abstract
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would ena ble much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our ap proach is to develop a trainable information extraction system that takes t wo inputs. The first is an ontology that defines the classes (e.g,, company , person, employee, product) and relations (e.g., employed by, produced by) of interest when creating the knowledge base. The second is a set of train ing data consisting of labeled regions of hypertext that represent instance s of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This articl e describes our general approach, several machine learning algorithms for t his task, and promising initial results with a prototype system that has cr eated a knowledge base describing university people, courses, and research projects. (C) 2000 Elsevier Science B.V. All rights reserved.