A. Babajide et al., NEUTRAL NETWORKS IN PROTEIN SPACE - A COMPUTATIONAL STUDY BASED ON KNOWLEDGE-BASED POTENTIALS OF MEAN FORCE, Folding & design, 2(5), 1997, pp. 261-269
Background: Many protein sequences, often unrelated, adopt similar fol
ds. Sequences folding into the same shape thus form subsets of sequenc
e space. The shape and the connectivity of these sets have implication
s for protein evolution and de novo design. Results: We investigate th
e topology of these sets for some proteins with known three-dimensiona
l structure using inverse folding techniques. First, we find that sequ
ences adopting a given fold do not cluster in sequence space and that
there is no detectable sequence homology among them. Nevertheless, the
se sequences are connected in the sense that there exists a path such
that every sequence can be reached from every other sequence while the
fold remains unchanged. We find similar results for restricted amino
acid alphabets in some cases (e.g. ADLG). In other cases, it seems imp
ossible to find sequences with native-like behavior (e.g. QLR). These
findings seem to be independent of the particular structure considered
. Conclusions: Amino acid sequences folding into a common shape are di
stributed homogeneously in sequence space. Hence, the connectivity of
the set of these sequences implies the existence of very long neutral
paths on all examined protein structures. Regarding protein design, th
ese results imply that sequences with more or less arbitrary chemical
properties can be attached to a given structural framework. But we als
o observe that designability varies significantly among native structu
res. These features of protein sequence space are similar to what has
been found for nucleic acids.