Topic distillation and spectral filtering

Citation
S. Chakrabarti et al., Topic distillation and spectral filtering, ARTIF INT R, 13(5-6), 1999, pp. 409-435
Citations number
64
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
ARTIFICIAL INTELLIGENCE REVIEW
ISSN journal
02692821 → ACNP
Volume
13
Issue
5-6
Year of publication
1999
Pages
409 - 435
Database
ISI
SICI code
0269-2821(199912)13:5-6<409:TDASF>2.0.ZU;2-Y
Abstract
This paper discuss topic distillation, an information retrieval problem tha t is emerging as a critical task for the www. Algorithms for this problem m ust distill a small number of high-quality documents addressing a broad top ic from a large set of candidates. We give a review of the literature, and compare the problem with related tasks such as classification, clustering, and indexing. We then describe a general approach to topic distillation wit h applications to searching and partitioning, based on the algebraic proper ties of matrices derived from particular documents within the corpus. Our m ethod - which we call special filtering - combines the use of terms, hyperl inks and anchor-text to improve retrieval performance. We give results for broad-topic queries on the www, and also give some anecdotal results applyi ng the same techniques to US Supreme Court law cases, US patents, and a set of Wall Street Journal newspaper articles.