ITA
ENG

EFFICIENT SAMPLING STRATEGIES FOR RELATIONAL DATABASE OPERATIONS

Authors

LIPTON RJ NAUGHTON JF SCHNEIDER DA SESHADRI S

Citation

Rj. Lipton et al., EFFICIENT SAMPLING STRATEGIES FOR RELATIONAL DATABASE OPERATIONS, Theoretical computer science, 116(1), 1993, pp. 195-226

Citations number

Categorie Soggetti

Computer Sciences","Computer Applications & Cybernetics",Mathematics

Journal title

Theoretical computer science → ACNP

ISSN journal

03043975

Volume

116

Issue

Year of publication

1993

Pages

195 - 226

Database

ISI

SICI code

0304-3975(1993)116:1<195:ESSFRD>2.0.ZU;2-I

Abstract

Recently, we have proposed an adaptive, random-sampling algorithm for general query size estimation in databases. In an earlier work we anal yzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to the relational dat abase operations select, project, and join. We extend our previous ana lysis to provide significantly improved bounds on the amount of sampli ng necessary for a given level of accuracy. Also, we provide 'sanity b ounds'' to deal with queries for which the underlying data are extreme ly skewed or the query result is very small. We investigate how the ex istence of indices can be used to generate more efficient sampling alg orithms for the operations of project and join. Finally, we report on the performance of the estimation algorithm, both as implemented in '' stand alone'' C programs and as implemented in a host language on a co mmericial relational system.