A cost model for query processing in high dimensional data spaces

Authors
Citation
C. Bohm, A cost model for query processing in high dimensional data spaces, ACM T DATAB, 25(2), 2000, pp. 129-178
Citations number
69
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON DATABASE SYSTEMS
ISSN journal
03625915 → ACNP
Volume
25
Issue
2
Year of publication
2000
Pages
129 - 178
Database
ISI
SICI code
0362-5915(200006)25:2<129:ACMFQP>2.0.ZU;2-8
Abstract
During the last decade, multimedia databases have become increasingly impor tant in many application areas such as medicine, CAD, geography, and molecu lar biology. An important research topic in multimedia databases is similar ity search in large data sets. Most current approaches that address similar ity search use the feature approach, which transforms important properties of the stored objects into points of a high-dimensional space (feature vect ors). Thus, similarity search is transformed into a neighborhood search in feature space. Multidimensional index structures are usually applied when m anaging feature vectors. Query processing can be improved substantially wit h optimization techniques such as blocksize optimization, data space quanti zation, and dimension reduction. To determine optimal parameters, an accura te estimate of index-based query processing performance is crucial. In this paper we develop a cost model for index structures for point databases suc h as the R *-tree and the X-tree. It provides accurate estimates of the num ber of data page accesses for range queries and nearest-neighbor queries un der a Euclidean metric and a maximum metric. The problems specific to high- dimensional data spaces, called boundary effects, are considered. The conce pt of the fractal dimension is used to take the effects of correlated data into account.