PHYLOGENETIC CONTINUUM INDICATES GALAXIES IN THE PROTEIN UNIVERSE - PRELIMINARY-RESULTS ON THE NATURAL GROUP STRUCTURES OF PROTEINS

Authors
Citation
I. Ladunga, PHYLOGENETIC CONTINUUM INDICATES GALAXIES IN THE PROTEIN UNIVERSE - PRELIMINARY-RESULTS ON THE NATURAL GROUP STRUCTURES OF PROTEINS, Journal of molecular evolution, 34(4), 1992, pp. 358-375
Citations number
68
ISSN journal
00222844
Volume
34
Issue
4
Year of publication
1992
Pages
358 - 375
Database
ISI
SICI code
0022-2844(1992)34:4<358:PCIGIT>2.0.ZU;2-5
Abstract
The markedly nonuniform, even systematic distribution of sequences in the protein "universe" has been analyzed by methods of protein taxonom y. Mapping of the natural hierarchical system of proteins has revealed some dense cores, i.e., well-defined clusterings of protein that seem to be natural structural groupings, possibly seeds for a future prote in taxonomy. The aim was not to force proteins into more or less man-m ade categories by discriminant analysis, but to find structurally simi lar groups, possibly of common evolutionary origin. Single-valued dist ance measures between pairs of superfamilies from the Protein Identifi cation Resource were defined by two chi(2)-like methods on tripeptide frequencies and the variable-length subsequence identity method derive d from dot-matrix comparisons. Distance matrices were processed by sev eral methods of cluster analysis to detect phylogenetic continuum betw een highly divergent proteins. Only well-defined clusters characterize d by relatively unique structural, intracellular environmental, organi smal, and functional attribute states were selected as major protein g roups, including subsets of viral and Escherichia coli proteins, hormo nes, inhibitors, plant, ribosomal, serum and structural proteins, amin o acid synthases, and clusters dominated by certain oxidoreductases an d apolar and DNA-associated enzymes. The limited repertoire of functio nal patterns due to small genome size, the high rate of recombination, specific features of the bacterial membranes, or of the virus cycle c analize certain proteins of viruses and Gram-negative bacteria, respec tively, to organismal groups.