Protein design experiments have shown that the use of specific subsets of a
mino acids can produce foldable proteins. This prompts the question of whet
her there is a minimal amino acid alphabet which could be used to fold all
proteins. In this work we make an analogy between sequence patterns which p
roduce foldable sequences and those which make it possible to detect struct
ural homologs by aligning sequences, and use it to suggest the possible siz
e of such a reduced alphabet. We estimate that reduced alphabets containing
10-12 letters can be used to design foldable sequences for a large number
of protein families. This estimate is based on the observation that there i
s little loss of the information necessary to pick out structural homologs
in a clustered protein sequence database when a suitable reduction of the a
mino acid alphabet from 20 to 10 letters is made, but that this information
is rapidly degraded when further reductions in the alphabet are made.