EXPERIMENTAL(1-6) and simulation(7) studies show that small monomeric
proteins fold in one kinetic step, which entails overcoming the free-e
nergy barrier between the unfolded and the native protein through a tr
ansition state(8,9). Two models of transition state formation have bee
n proposed: a 'nonspecific' one in which it depends on the formation o
f a sufficient number of native-like contacts regardless of what amino
acids are involved(10-12) and a 'specific' one, in which it depends o
n formation of a specific subset of the native structure (a folding nu
cleus)(8,13,14). The latter requires that some amino acids form most o
f their contacts in the transition state, whereas others only do so on
reaching the native conformation. If so, mutations affecting the stab
ility of the transition state nucleus should have a greater effect on
the folding kinetics than mutations elsewhere, and the residues involv
ed should be evolutionarily conserved. Lattice-model simulations and e
xperiments(8,13-16) suggest that such mutations exist. Here we present
a method for determining the folding nucleus of a protein with known
structure with two-state folding kinetics. This method is based on the
alignment of many sequences designed to fold into the native conforma
tion of a protein to identify the positions where amino acids are most
conserved in designed sequences. The method is applied to chymotrypsi
n inhibitor 2 (CI2), a protein whose transition state has been previou
sly studied by protein engineering(14-16). The involvement of residues
in folding nucleus of CI2 is clearly correlated with their conservati
on in design, and the residues forming the nucleus are highly conserve
d in 23 natural sequences homologous to CI2.