Document categorization and query generation on the World Wide Web using WebACE

Citation
D. Boley et al., Document categorization and query generation on the World Wide Web using WebACE, ARTIF INT R, 13(5-6), 1999, pp. 365-391
Citations number
36
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
ARTIFICIAL INTELLIGENCE REVIEW
ISSN journal
02692821 → ACNP
Volume
13
Issue
5-6
Year of publication
1999
Pages
365 - 391
Database
ISI
SICI code
0269-2821(199912)13:5-6<365:DCAQGO>2.0.ZU;2-2
Abstract
We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsup ervised categorization of a set of documents, combined with a process for g enerating new queries that is used to search for new related documents and for filtering the resulting documents to extract the ones most closely rela ted to the starting set. The document categories are not given a priori. We present the overall architecture and describe two novel algorithms which p rovide significant improvement over Hierarchical Agglomeration Clustering a nd AutoClass algorithms and form the basis for the query generation and sea rch component of the agent. We report on the results of our experiments com paring these new algorithms with more traditional clustering algorithms and we show that our algorithms are fast and sacalable.