ITA
ENG

Query-based sampling of text databases

Authors

Callan, J Connell, M

Citation

J. Callan et M. Connell, Query-based sampling of text databases, ACM T INF S, 19(2), 2001, pp. 97-130

Citations number

Categorie Soggetti

Information Tecnology & Communication Systems

Journal title

ACM TRANSACTIONS ON INFORMATION SYSTEMS

ISSN journal

10468188 → ACNP

Volume

Issue

Year of publication

2001

Pages

97 - 130

Database

ISI

SICI code

1046-8188(200104)19:2<97:QSOTD>2.0.ZU;2-9

Abstract

The proliferation of searchable text databases on corporate networks and th e Internet causes a database selection problem for many people. Algorithms such as gGlOSS and CORI can automatically select which text databases to se arch for a given information need, but only if given a set of resource desc riptions that accurately represent the contents of each database. The exist ing techniques for acquiring resource descriptions have significant limitat ions when used in wide-area networks controlled by many parties. This paper presents query-based sampling, a new technique for acquiring accurate reso urce descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a part icular search engine or representation technique. An extensive set of exper imental results demonstrates that accurate resource descriptions are create d, that computation and communication costs are reasonable, and that the re source descriptions do in fact enable accurate automatic database selection .