Overcoming the curse of dimensionality in clustering by means of the wavelet transform

Citation
F. Murtagh et al., Overcoming the curse of dimensionality in clustering by means of the wavelet transform, COMPUTER J, 43(2), 2000, pp. 107-120
Citations number
27
Categorie Soggetti
Computer Science & Engineering
Journal title
COMPUTER JOURNAL
ISSN journal
00104620 → ACNP
Volume
43
Issue
2
Year of publication
2000
Pages
107 - 120
Database
ISI
SICI code
0010-4620(2000)43:2<107:OTCODI>2.0.ZU;2-A
Abstract
We use a redundant wavelet transform analysis to detect clusters in high-di mensional data spaces. We overcome Bellman's 'curse of dimensionality' in s uch problems by (i) using some canonical ordering of observation and variab le (document and term) dimensions in our data, (ii) applying a wavelet tran sform to such canonically ordered data, (iii) modelling the noise in wavele t space, (iv) defining significant component parts of the data as opposed t o insignificant or noisy component parts, and (v) reading off the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, i ncluding the clustering of high-dimensional hypertext data.