It is increasingly difficult to make effective use of Internet informa
tion, given the rapid growth in data volume, user base, and data diver
sity. In this paper we introduce Harvest, a system that provides a sca
lable, customizable architecture for gathering, indexing, caching, rep
licating, and accessing Internet information.