Multidimensional OLAP products provide an excellent opportunity for integra
ting mining functionality because of their widespread acceptance as a decis
ion support tool and their existing heavy reliance on manual, user-driven a
nalysis. Most OLAP products are rather simplistic and rely heavily on the u
ser's intuition to manually drive the discovery process. Such ad hoc user-d
riven exploration gets tedious and error-prone as data dimensionality and s
ize increases. Our goal is to automate these manual discovery processes. In
this paper we present an example of such automation through a iDiff operat
or that in a single step returns summarized reasons for drops or increases
observed at an aggregated level.
We formulate this as a problem of summarizing the difference between two mu
ltidimensional arrays of real numbers. We develop a general framework for s
uch summarization and propose a specific formulation for the case of OLAP a
ggregates. We develop an information theoretic formulation for expressing t
he reasons that is compact and easy to interpret. We design an efficient dy
namic programming algorithm that requires only one pass of the data and use
s a small amount of memory independent of the data size. This allows easy i
ntegration with existing OLAP products. Our prototype has been tested on th
e Microsoft OLAP server, DB2/UDB and Oracle 8i. Experiments using the OLAP
benchmark demonstrate (1) scalability of our algorithm as the size and dime
nsionality of the cube increases and (2) feasibility of getting interactive
answers with modest hardware resources.