Outlier detection for high dimensional data

Citation
Cc. Aggarwal et Ps. Yu, Outlier detection for high dimensional data, SIG RECORD, 30(2), 2001, pp. 37-46
Citations number
27
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
30
Issue
2
Year of publication
2001
Pages
37 - 46
Database
ISI
SICI code
0163-5808(200106)30:2<37:ODFHDD>2.0.ZU;2-S
Abstract
The outlier detection problem has important applications in the field of fr aud detection, network robustness analysis, and intrusion detection. Most s uch applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity i n order to find outliers based on their relationship to the rest of the dat a. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of hig h dimensional data implies that every point is an almost equally good outli er from the perspective of proximity-based definitions. Consequently, for h igh dimensional data, the notion of finding meaningful outliers becomes sub stantially more complex and non-obvious. In this paper, we discuss new tech niques for outlier detection which find the outliers by studying the behavi or of projections from the data set.