We present WebACE, an agent for exploring and categorizing documents on the
World Wide Web based on a user profile. The heart of the agent is an unsup
ervised categorization of a set of documents, combined with a process for g
enerating new queries that is used to search for new related documents and
for filtering the resulting documents to extract the ones most closely rela
ted to the starting set. The document categories are not given a priori. We
present the overall architecture and describe two novel algorithms which p
rovide significant improvement over Hierarchical Agglomeration Clustering a
nd AutoClass algorithms and form the basis for the query generation and sea
rch component of the agent. We report on the results of our experiments com
paring these new algorithms with more traditional clustering algorithms and
we show that our algorithms are fast and sacalable.