Genetic selection was used to explore the probability of finding enzymes in
protein sequence space. Large degenerate libraries were prepared by replac
ing all secondary structure units in a dimeric, helical bundle chorismate m
utase with simple binary-patterned modules based on a limited set of four p
olar and four nonpolar residues. Two-stage in vivo selection yielded cataly
tically active variants possessing biophysical and kinetic properties typic
al of the natural enzyme even though approximate to 80% of the protein orig
inates from the simplified modules and > 90% of the protein consists of onl
y eight different amino acids. This study provides a quantitative assessmen
t of the number of sequences compatible with a given fold and implicates pr
eviously unidentified residues needed to form a functional active site. Giv
en the extremely low incidence of enzymes in completely unbiased libraries,
strategies that combine chemical information with genetic selection, like
the one used here, may be generally useful in designing novel protein scaff
olds with tailored activities.