GlOSS: Text-source discovery over the Internet

Citation
L. Gravano et al., GlOSS: Text-source discovery over the Internet, ACM T DATAB, 24(2), 1999, pp. 229-264
Citations number
38
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON DATABASE SYSTEMS
ISSN journal
03625915 → ACNP
Volume
24
Issue
2
Year of publication
1999
Pages
229 - 264
Database
ISI
SICI code
0362-5915(199906)24:2<229:GTDOTI>2.0.ZU;2-0
Abstract
The dramatic growth of the Internet has created a new problem for users: lo cation of the relevant sources of documents. This article presents a framew ork for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases . First, each text source exports its contents to a centralized service. Se cond, users present queries to the service, which returns an ordered list o f promising text sources. This article describes GlOSS, Glossary of Servers Server, with two versions: bGlOSS, which provides a Boolean query retrieva l model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We e xtensively describe the methodology for measuring the retrieval effectivene ss of these systems and provide experimental evidence, based on actual data , that all three systems are highly effective in determining promising text sources for a given query.