DOT-PLOT COMPARISONS BY MULTIVARIATE-ANALYSIS (DOCMA) - A TOOL FOR CLASSIFYING PROTEIN SEQUENCES

Citation
C. Landes et al., DOT-PLOT COMPARISONS BY MULTIVARIATE-ANALYSIS (DOCMA) - A TOOL FOR CLASSIFYING PROTEIN SEQUENCES, Computer applications in the biosciences, 9(2), 1993, pp. 191-196
Citations number
25
ISSN journal
02667061
Volume
9
Issue
2
Year of publication
1993
Pages
191 - 196
Database
ISI
SICI code
0266-7061(1993)9:2<191:DCBM(->2.0.ZU;2-6
Abstract
A method aimed at classifying protein sequences without resorting to p airwise alignment is presented. Called DOCMA (DOt-plot Comparisons by Multivariate Analysis), it is based on a multivariate analysis of the pairwise dot-plots between all the sequences in the set. The dot-plots are first simplified by considering only the projections of the 'diag onal' segments of similarity onto the axes. From these projections a d ata matrix is built, in which each column is representative of the com parisons of one given sequence with all the other ones. This data matr ix is then transformed into a distance matrix by a chi-squared analysi s, from which the coordinates of the sequences in an orthonormal Eucli dean space are obtained. The sequences are finally classified by a dyn amic clustering procedure followed by a search for strong clusters. Ap plication of this method to protein families such as the globins, the cytochromes c and the aminoacyl-tRNA synthetases shows that it is quit e effective in delineating subgroups that contain even distantly relat ed sequences.