The Web harbors a large number of communities - groups of content-creators
sharing a common interest - each of which manifests itself as a set of inte
rlinked Web pages. Newgroups and commercial Web directories together contai
n of the order of 20,000 such communities; our particular interest here is
on emerging communities - those that have little or no representation in su
ch fora. The subject of this paper is the systematic enumeration of over 10
0,000 such emerging communities from a Web crawl: we call our process trawl
ing. We motivate a graph-theoretic approach to locating such communities, a
nd describe the algorithms, and the algorithmic engineering necessary to fi
nd structures that subscribe to this notion, the challenges in handling suc
h a huge data set, and the results of our experiment. (C) 1999 Published by
Elsevier Science B.V. All rights reserved.