SELECTIVITY AND COST ESTIMATION FOR JOINS BASED ON RANDOM SAMPLING

Citation
Pj. Haas et al., SELECTIVITY AND COST ESTIMATION FOR JOINS BASED ON RANDOM SAMPLING, Journal of computer and system sciences, 52(3), 1996, pp. 550-569
Citations number
34
Categorie Soggetti
System Science","Computer Science Hardware & Architecture","Computer Science Theory & Methods
ISSN journal
00220000
Volume
52
Issue
3
Year of publication
1996
Pages
550 - 569
Database
ISI
SICI code
0022-0000(1996)52:3<550:SACEFJ>2.0.ZU;2-0
Abstract
We compare the performance of sampling-based procedures for estimating the selectivity of a join. While some of the procedures have been pro posed in the database literature, their relative performance has never been analyzed. A main result of this paper is a partial ordering that compares the variability of the estimators for the different procedur es after an arbitrary fixed number of sampling steps. Prior to the cur rent work, it was also unknown whether these fixed-step procedures cou ld be extended to fixed-precision procedures that are both asymptotica lly consistent and asymptotically efficient.: Our second main result i s a general method for such an extension and a proof that the method i s valid for all the procedures under consideration. We show that, unde r plausible assumptions on sampling costs, the partial ordering of the fixed-step procedures with respect to variability of the selectivity estimator implies a partial ordering of the corresponding fixed-precis ion procedures with respect to sampling cost. Our final result is a co llection of fixed-step and fixed-precision procedures for estimating t he cost of processing a join query according to a fixed join plan. (C) 1996 Academic Press, Inc.