ITA
ENG

Trawling the Web for emerging cyber-communities

Authors

Kumar, R Raghavan, P Rajagopalan, S Tomkins, A

Citation

R. Kumar et al., Trawling the Web for emerging cyber-communities, COMPUT NET, 31(11-16), 1999, pp. 1481-1493

Citations number

Categorie Soggetti

Information Tecnology & Communication Systems

Journal title

COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING

ISSN journal

13891286 → ACNP

Volume

Issue

11-16

Year of publication

1999

Pages

1481 - 1493

Database

ISI

SICI code

1389-1286(19990517)31:11-16<1481:TTWFEC>2.0.ZU;2-Y

Abstract

The Web harbors a large number of communities - groups of content-creators sharing a common interest - each of which manifests itself as a set of inte rlinked Web pages. Newgroups and commercial Web directories together contai n of the order of 20,000 such communities; our particular interest here is on emerging communities - those that have little or no representation in su ch fora. The subject of this paper is the systematic enumeration of over 10 0,000 such emerging communities from a Web crawl: we call our process trawl ing. We motivate a graph-theoretic approach to locating such communities, a nd describe the algorithms, and the algorithmic engineering necessary to fi nd structures that subscribe to this notion, the challenges in handling suc h a huge data set, and the results of our experiment. (C) 1999 Published by Elsevier Science B.V. All rights reserved.