I. Ladunga, PHYLOGENETIC CONTINUUM INDICATES GALAXIES IN THE PROTEIN UNIVERSE - PRELIMINARY-RESULTS ON THE NATURAL GROUP STRUCTURES OF PROTEINS, Journal of molecular evolution, 34(4), 1992, pp. 358-375
The markedly nonuniform, even systematic distribution of sequences in
the protein "universe" has been analyzed by methods of protein taxonom
y. Mapping of the natural hierarchical system of proteins has revealed
some dense cores, i.e., well-defined clusterings of protein that seem
to be natural structural groupings, possibly seeds for a future prote
in taxonomy. The aim was not to force proteins into more or less man-m
ade categories by discriminant analysis, but to find structurally simi
lar groups, possibly of common evolutionary origin. Single-valued dist
ance measures between pairs of superfamilies from the Protein Identifi
cation Resource were defined by two chi(2)-like methods on tripeptide
frequencies and the variable-length subsequence identity method derive
d from dot-matrix comparisons. Distance matrices were processed by sev
eral methods of cluster analysis to detect phylogenetic continuum betw
een highly divergent proteins. Only well-defined clusters characterize
d by relatively unique structural, intracellular environmental, organi
smal, and functional attribute states were selected as major protein g
roups, including subsets of viral and Escherichia coli proteins, hormo
nes, inhibitors, plant, ribosomal, serum and structural proteins, amin
o acid synthases, and clusters dominated by certain oxidoreductases an
d apolar and DNA-associated enzymes. The limited repertoire of functio
nal patterns due to small genome size, the high rate of recombination,
specific features of the bacterial membranes, or of the virus cycle c
analize certain proteins of viruses and Gram-negative bacteria, respec
tively, to organismal groups.