Many seemingly unrelated protein families share common folds. Theoretical m
odels based on structure designability have suggested that a few folds shou
ld be very common while many others have low probability. In agreement with
the predictions of these models, we show that the distribution of observed
protein families over different folds can be modeled with a highly-stretch
ed exponential. Our results suggest that there are approximately 4,000 poss
ible folds, some so unlikely that only approximately 2,000 folds existing a
mong naturally-occurring proteins. Due to the large number of extremely rar
e folds, constructing a comprehensive database of all existent folds would
be difficult. Constructing a database of the most-likely folds representing
the vast majority of protein families would be considerably easier. Protei
ns 1999;35:408-414. (C) 1999 Wiley-Liss, Inc.