ITA
ENG

Missing value estimation methods for DNA microarrays

Authors

Troyanskaya, O Cantor, M Sherlock, G Brown, P Hastie, T Tibshirani, R Botstein, D Altman, RB

Citation

O. Troyanskaya et al., Missing value estimation methods for DNA microarrays, BIOINFORMAT, 17(6), 2001, pp. 520-525

Citations number

Categorie Soggetti

Multidisciplinary

Journal title

BIOINFORMATICS

ISSN journal

13674803 → ACNP

Volume

Issue

Year of publication

2001

Pages

520 - 525

Database

ISI

SICI code

1367-4803(200106)17:6<520:MVEMFD>2.0.ZU;2-2

Abstract

Motivation: Gene expression microarray experiments can generate data sets w ith multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clu stering are not robust to missing data, and may lose effectiveness even wit h a few missing values. Methods for imputing missing data are needed, there fore, to minimize the effect of incomplete data sets on analyses, and to in crease the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. Results: We present a comparative study of several methods for the estimati on of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute ), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real d ata sets, and assessed the robustness of the imputation methods to the amou nt of missing data over the range of 1-20% missing values. We show that KNN impute appears to provide a more robust and sensitive method for missing va lue estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method las well as filling missing values with z eros). We report results of the comparative experiments and provide recomme ndations and tools for accurate estimation of missing microarray data under a variety of conditions.