Loglinear-based quasi cubes

Authors
Citation
D. Barbara et Xt. Wu, Loglinear-based quasi cubes, J INTELL IN, 16(3), 2001, pp. 255-276
Citations number
25
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
ISSN journal
09259902 → ACNP
Volume
16
Issue
3
Year of publication
2001
Pages
255 - 276
Database
ISI
SICI code
0925-9902(200108)16:3<255:LQC>2.0.ZU;2-R
Abstract
A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains in each cell an aggregate value, i .e., the result of applying an aggregate function to an underlying relation . In practical situations, cubes can require a large amount of storage, so, compressing them is of practical importance. In this paper, we propose an approximation technique that reduces the storage cost of the cube at the pr ice of getting approximate answers for the queries posed against the cube. The idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself. Then, the model pa rameters can be used to estimate the cube cells with a certain level of acc uracy. To increase the accuracy, and to guarantee the level of error in the query answers, some of the "outliers" (i.e., cells that incur in the large st errors when estimated), are retained. The storage taken by the model par ameters and the retained cells, of course, should take a fraction of the sp ace of the full cube and the estimation procedure should be faster than com puting the data from the underlying relations. We use loglinear models to m odel the cube regions. Experiments show that the errors introduced in typic al queries are small even when the description is substantially smaller tha n the full cube. Since cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates (but rather in trends), providing approximate answers is, in most cases, a satisfactory compromise. Although other techniques have been used for the purpose of co mpressing data cubes, ours has the advantage of using parametric (loglinear ) models and the retaining of outliers, which enables the system to give er ror guarantees that are data independent, for every query posed on the data cube. The models also offer information about the underlying structure of the data modeled by them. Moreover, these models are relatively easy to upd ate dynamically as data is added to the warehouse.