ITA
ENG

k-POD: A Method for k-Means Clustering of Missing Data

Authors

Jocelyn T. Chi Richard G. Baraniuk Eric C. Chi

Citation

Jocelyn T. Chi et al., k-POD: A Method for k-Means Clustering of Missing Data, American statistician , 70(1), 2016, pp. 91-99

Journal title

American statistician → ACNP

ISSN journal

00031305

Volume

Issue

Year of publication

2016

Pages

91 - 99

Database

ACNP

SICI code

Abstract

The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.