Fuzzy c-means clustering of incomplete data

Citation
Rj. Hathaway et Jc. Bezdek, Fuzzy c-means clustering of incomplete data, IEEE SYST B, 31(5), 2001, pp. 735-744
Citations number
24
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS
ISSN journal
10834419 → ACNP
Volume
31
Issue
5
Year of publication
2001
Pages
735 - 744
Database
ISI
SICI code
1083-4419(200110)31:5<735:FCCOID>2.0.ZU;2-7
Abstract
The problem of clustering a real s-dimensional data set X = {x(1)..., x(n)} subset of R-s is considered. Usually, each observation (or datum) consists of numerical values for all s features (such as height, length, etc.), but sometimes data sets can contain vectors that are missing one or more of th e feature values. For example, a particular datum xk might be incomplete, h aving the form x(k) = (254.3, ?, 333.2, 47.44, ?)(T), where the second and fifth feature values are missing. The fuzzy e-means (FCM) algorithm is a us eful tool for clustering real s-dimensional data, but it is not directly ap plicable to the case of incomplete data. Four strategies for doing FCM clus tering of incomplete data sets are given, three of which involve modified v ersions of the FCM algorithm. Numerical convergence properties of the new a lgorithms are discussed, and all approaches are tested using real and artif icially generated incomplete data sets.