Query optimization in compressed database systems

Citation
Zy. Chen et al., Query optimization in compressed database systems, SIG RECORD, 30(2), 2001, pp. 271-282
Citations number
31
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
30
Issue
2
Year of publication
2001
Pages
271 - 282
Database
ISI
SICI code
0163-5808(200106)30:2<271:QOICDS>2.0.ZU;2-5
Abstract
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database s ystems. Previous work describes the benefits of compression for numerical a ttributes, where data is stored in compressed format on disk. Despite the a bundance of string-valued attributes in relational schemas there is little work on compression for string attributes in a database context. Moreover, none of the,previous work suitably addresses the role of the query optimize r: During query execution,data is either eagerly decompressed when it is re ad into main memory, or data lazily stays compressed in main memory and is decompressed on demand only. In this paper, we present in effective approach for database compression ba sed on lightweight, attribute-level compression techniques. We propose a Hi erarchical Dictionary Encoding strategy that intelligently selects the most effective compression method for string-valued attributes. We show that ea ger and lazy decompression strategies produce suboptimal plans for queries involving compressed string attributes. We then formalize the problem of co mpression-aware query optimization and propose one provably optimal and two fast heuristic algorithms for selecting a query plan for relational schema s with compressed attributes; our algorithms can easily be integrated into existing cost-based query optimizers. Experiments using TPC-H data demonstr ate the impact of our string compression methods and show the importance of compression-aware query optimization. Our approach results in up to an ord er speed up over existing approaches.