A number of fundamental questions in structural biology concern the diversi
ty of protein architectures (or folds). Here, we address two of them, the s
ize of the universe of folds, and the distribution of sequence families amo
ng them, using an analysis based on a new and rigorous statistical sampling
method. In particular we show that the number of known non-transmembrane p
rotein folds is approximately one half of the total that exist, and that ce
rtain superfolds should exist, which accommodate dozens of non-homologous s
equence families. (C) 1998 Academic Press.