Trawling the Web for emerging cyber-communities

Citation
R. Kumar et al., Trawling the Web for emerging cyber-communities, COMPUT NET, 31(11-16), 1999, pp. 1481-1493
Citations number
23
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING
ISSN journal
13891286 → ACNP
Volume
31
Issue
11-16
Year of publication
1999
Pages
1481 - 1493
Database
ISI
SICI code
1389-1286(19990517)31:11-16<1481:TTWFEC>2.0.ZU;2-Y
Abstract
The Web harbors a large number of communities - groups of content-creators sharing a common interest - each of which manifests itself as a set of inte rlinked Web pages. Newgroups and commercial Web directories together contai n of the order of 20,000 such communities; our particular interest here is on emerging communities - those that have little or no representation in su ch fora. The subject of this paper is the systematic enumeration of over 10 0,000 such emerging communities from a Web crawl: we call our process trawl ing. We motivate a graph-theoretic approach to locating such communities, a nd describe the algorithms, and the algorithmic engineering necessary to fi nd structures that subscribe to this notion, the challenges in handling suc h a huge data set, and the results of our experiment. (C) 1999 Published by Elsevier Science B.V. All rights reserved.