Complementary hierarchical clustering

Citation
Nowak, Gen et Tibshirani, Robert, Complementary hierarchical clustering, Biostatistics (Oxford. Print) , 9(3), 2008, pp. 467-483
ISSN journal
14654644
Volume
9
Issue
3
Year of publication
2008
Pages
467 - 483
Database
ACNP
SICI code
Abstract
When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly differentially expressed genes that have closely related expression patterns.Sometimes, these genes may not be relevant to the biological process under study or their functions may already be known.The problem is that these genes can potentially drown out the effects of other genes that are relevant or have novel functions.We propose a procedure called complementary hierarchical clustering that is designed to uncover the structures arising from these novel genes that are not as highly expressed. Simulation studies show that the procedure is effective when applied to a variety of examples.We also define a concept called relative gene importance that can be used to identify the influential genes in a given clustering.Finally, we analyze a microarray data set from 295 breast cancer patients, using clustering with the correlation-based distance measure.The complementary clustering reveals a grouping of the patients which is uncorrelated with a number of known prognostic signatures and significantly differing distant metastasis-free probabilities.