This paper discuss topic distillation, an information retrieval problem tha
t is emerging as a critical task for the www. Algorithms for this problem m
ust distill a small number of high-quality documents addressing a broad top
ic from a large set of candidates. We give a review of the literature, and
compare the problem with related tasks such as classification, clustering,
and indexing. We then describe a general approach to topic distillation wit
h applications to searching and partitioning, based on the algebraic proper
ties of matrices derived from particular documents within the corpus. Our m
ethod - which we call special filtering - combines the use of terms, hyperl
inks and anchor-text to improve retrieval performance. We give results for
broad-topic queries on the www, and also give some anecdotal results applyi
ng the same techniques to US Supreme Court law cases, US patents, and a set
of Wall Street Journal newspaper articles.