Turbo-charging vertical mining of large databases

Citation
P. Shenoy et al., Turbo-charging vertical mining of large databases, SIG RECORD, 29(2), 2000, pp. 22-33
Citations number
11
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
29
Issue
2
Year of publication
2000
Pages
22 - 33
Database
ISI
SICI code
0163-5808(200006)29:2<22:TVMOLD>2.0.ZU;2-R
Abstract
In a vertical representation of a market-basket database, each item is asso ciated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently pr oposed for this representation show performance improvements over their cla ssical horizontal counterparts, but are either efficient only for certain d atabase sizes, or assume particular characteristics of the database content s, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpos e, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called " snakes" and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic d atabase workloads. Our experimental results indicate significant performanc e gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.