Here we study the pattern of amino acid pair interchanges at spatially
, locally conserved regions in globally dissimilar and unrelated prote
ins. By using a method which completely separates the amino acid seque
nce from its respective structure, this work addresses the question of
which properties of the amino acids are the most crucial for the stab
ility of conserved structural motifs. The proteins are taken from a st
ructurally non-redundant dataset. The spatially conserved substructura
l motifs are defined as consisting of a ''large enough'' number of C-a
lpha atoms found to provide a geometric match between two proteins, re
gardless of the order of the C-alpha atoms in the sequence, or of the
sequence composition of the substructures. This approach can apply to
proteins with little or no sequence similarity but with sufficient str
uctural similarity, and is unique in its ability to handle local, non-
topological matches between pairs of dissimilar proteins. The method u
ses a computer-vision based algorithm, the Geometric Hashing. Since th
e Geometric Hashing ignores sequence information it lends itself to an
swer the question posed above. The interchanges at geometrically simil
ar positions that have been obtained with our method demonstrate the e
xpected behaviour. Yet, a closer inspection reveals some distinct char
acteristics, as compared with interchanges based upon sequence-order b
ased techniques, or from energy-contact-based considerations. First, a
pronounced division of the amino acids into two classes is displayed:
Lys, Glu, Arg, Gin, Asp, Asn, Pro, Gly, Thr, Ser and His on the one h
and, and Ile, Val, Leu, Phe, Met, Tyr, Trp, Cys and Ala on the other.
These groups further cluster into subgroups: Lys, Glu, Arg, Gin; Asp A
sn; Pro, Gly; lie, Val, Leu, Phe. The other amino acids stand alone. A
nalysis of the conservation among amino acids indicates proline to be
consistently by far, the most conserved. Next are Asp, Glu, Lys and Gl
y. Cys is also highly conserved. Interestingly, oppositely charged ami
no acids are interchanged roughly as frequently as those of the same c
harge. These observations can be explained in terms of the three-dimen
sional structures of the proteins. Most of all, there is a clear disti
nction between residues which prefer to be on the protein surfaces, co
mpared to those frequently buried in the interiors. Analysis of the in
terchanges indicates their low information content. This, together wit
h the separation into two groups, suggests that the predictive value o
f the spatial positions of the C-alpha atoms is not much greater than
the sequence alone, aside from their hydrophobicity/hydrophillicity cl
assification. (C) 1996 Academic Press Limited