A discrete-valued clustering algorithm with applications to biomolecular data

Citation
Akc. Wong et al., A discrete-valued clustering algorithm with applications to biomolecular data, INF SCI, 139(1-2), 2001, pp. 97-112
Citations number
28
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
INFORMATION SCIENCES
ISSN journal
00200255 → ACNP
Volume
139
Issue
1-2
Year of publication
2001
Pages
97 - 112
Database
ISI
SICI code
0020-0255(200111)139:1-2<97:ADCAWA>2.0.ZU;2-M
Abstract
This paper presents an algorithm for clustering large n-tuple discrete-valu ed data and describes how it is used for analyzing biomolecular data. The a lgorithm consists of a cluster initiation phase and a cluster regrouping ph ase. The former involves the analysis of the nearest-neighbour distance con figuration using the probability estimate of each sample in the data set. I t considers only a subset of variables with "consigned" or "transferred" in terdependency. That is, these variables reflect many of the data interdepen dencies among the ensemble. The latter involves: (1) the selection of relev ant attribute values based on their statistical dependence on the initial c lusters formed, and (2) the inference of the cluster label based on the wei ght of evidence of the selected attribute values of the samples pertaining to a certain cluster over the others. Because only a subset of selected att ribute values is considered, the final clusters can be of any "shape" and n ot necessarily "globular". Hence, it is not affected by the presence of irr elevant attribute values. Experimental results on several control data sets as well as a biomolecular data set demonstrate its efficacy for molecular sequence analysis and taxonomy analysis. (C) 2001 Elsevier Science Inc. All rights reserved.