The contents of many valuable web-accessible databases are only accessible
through search interfaces and are hence invisible to traditional web "crawl
ers." Recent studies have estimated the size of this "hidden web" to be 500
billion pages, while the size of the "crawlable" web is only an estimated
two billion pages. Recently commercial web sites have started to manually o
rganize web-accessible databases into Yahoo!-like hierarchical classificati
on schemes. In this paper, we introduce a method for automating this classi
fication process by using a small number of query probes. To classify a dat
abase, our algorithm does not retrieve or inspect any documents or pages fr
om the database, but father just exploits the number of matches that each q
uery probe generates at the database in question. We have conducted an exte
nsive experimental evaluation of our technique over collections of real doc
uments, including over One hundred web-accessible databases. Our experiment
s show that our system has law overhead and achieves high classification ac
curacy across a variety of databases.