It is generally accepted that many different protein sequences have similar
folded structures, and that there is a relatively high probability that a
new sequence possesses a previously observed fold. An indirect consequence
of this is that protein design should define the sequence space accessible
to a given structure, rather than providing a single optimized sequence. We
have recently developed a new approach for protein sequence design, which
optimizes the complete sequence of a protein based on the knowledge of its
backbone structure, its amino acid composition and a physical energy functi
on including van der Waals interactions, electrostatics, and environment fr
ee energy. The specificity of the designed sequence for its template backbo
ne is imposed by keeping the amino acid composition fixed. Here, we show th
at our procedure converges in sequence space, albeit not to the native sequ
ence of the protein. We observe that while polar residues are well conserve
d in our designed sequences, non-polar amino acids at the surface of a prot
ein are often replaced by polar residues. The designed sequences provide a
multiple alignment of sequences that all adopt the same three-dimensional f
old. This alignment is used to derive a profile matrix for chicken triose p
hosphate isomerase, TIM. The matrix is found to recognize significantly the
native sequence for TIM, as well as closely related sequences. Possible ap
plication of this approach to protein fold recognition is discussed. (C) 19
99 Academic Press.