The proliferation of searchable text databases on corporate networks and th
e Internet causes a database selection problem for many people. Algorithms
such as gGlOSS and CORI can automatically select which text databases to se
arch for a given information need, but only if given a set of resource desc
riptions that accurately represent the contents of each database. The exist
ing techniques for acquiring resource descriptions have significant limitat
ions when used in wide-area networks controlled by many parties. This paper
presents query-based sampling, a new technique for acquiring accurate reso
urce descriptions. Query-based sampling does not require the cooperation of
resource providers, nor does it require that resource providers use a part
icular search engine or representation technique. An extensive set of exper
imental results demonstrates that accurate resource descriptions are create
d, that computation and communication costs are reasonable, and that the re
source descriptions do in fact enable accurate automatic database selection
.