We have developed a method for identifying fold families in the protei
n structure data bank. Pairwise sequence alignments are first performe
d to extract families of homologous proteins having 35% or more sequen
ce identity. Representatives are selected with the best resolution and
R-factor to give a nonhomologous data set. Subsequent structure compa
risons between all members of this set detect homologous folds with lo
w sequence identity but highly conserved structures. By softening the
requirement on structural similarity, families of analogous proteins a
re obtained that have related folds but more diverse structures. Repre
sentatives are selected to give a non-analogous data set. Starting wit
h 141 0 chains from the Brookhaven Data Bank, we generate a set of 150
nonhomologous folds and a set of 112 non-analogous folds. Analysis of
sequence and structure conservation within the larger families shows
the globins to be the most highly conserved family and the TIM barrels
the most weakly conserved.