Similarity search in large time series databases has attracted much researc
h interest recently. ft is a difficult problem because of the typically hig
h dimensionality of the data.. The most promising solutions involve perform
ing dimensionality reduction on the data, then indexing the reduced data wi
th a multidimensional index structure. Many dimensionality reduction techni
ques have been proposed, including Singular Value Decomposition (SVD), the
Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT).
In this work we introduce a new dimensionality reduction technique which w
e call Adaptive Piecewise Constant Approximation (APCA). While previous tec
hniques (e.g., SVD, DFT and DWT) choose a common representation for all the
items in the database that minimizes the global reconstruction error, APCA
approximates each time series by a set of constant value segments of varyi
ng lengths such that their individual reconstruction errors are minimal. We
show how APCA can be indexed using a multidimensional index structure. We
propose two distance measures in the indexed space that exploit the high fi
delity of APCA for fast searching: a lower bounding Euclidean distance appr
oximation, and a non-lower bounding, but very tight Euclidean distance appr
oximation and show how they can support fast exact searching, and even fast
er approximate searching on the same index structure. We theoretically and
empirically compare APCA to all the other techniques and demonstrate its su
periority.