Query-based sampling of text databases

Citation
J. Callan et M. Connell, Query-based sampling of text databases, ACM T INF S, 19(2), 2001, pp. 97-130
Citations number
43
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
ACM TRANSACTIONS ON INFORMATION SYSTEMS
ISSN journal
10468188 → ACNP
Volume
19
Issue
2
Year of publication
2001
Pages
97 - 130
Database
ISI
SICI code
1046-8188(200104)19:2<97:QSOTD>2.0.ZU;2-9
Abstract
The proliferation of searchable text databases on corporate networks and th e Internet causes a database selection problem for many people. Algorithms such as gGlOSS and CORI can automatically select which text databases to se arch for a given information need, but only if given a set of resource desc riptions that accurately represent the contents of each database. The exist ing techniques for acquiring resource descriptions have significant limitat ions when used in wide-area networks controlled by many parties. This paper presents query-based sampling, a new technique for acquiring accurate reso urce descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a part icular search engine or representation technique. An extensive set of exper imental results demonstrates that accurate resource descriptions are create d, that computation and communication costs are reasonable, and that the re source descriptions do in fact enable accurate automatic database selection .