The 3Dee database of domain definitions was developed as a comprehensive co
llection of domain definitions for all three-dimensional structures in the
Protein Data Bank (PDB), The database includes definitions for complex, mul
tiple-segment and multiple-chain domains as well as simple sequential domai
ns, organized in a structural hierarchy. Two different snapshots of the 3De
e database were analyzed at September 1996 and November 1999, For the Novem
ber 1999 release, 7,995 PDB entries contained 13,767 protein chains and gav
e rise to 18,896 domains. The domain sequences clustered into 1,715 domain
sequence families, which were further clustered into a conservative 1,199 d
omain structure families (families with similar folds). The proportion of d
ifferent domain structure families per domain sequence family increases fro
m 84% for domains 1-100 residues long to 100% for domains greater than 600
residues. This is in keeping with the idea that longer chains will have mor
e alternative folds available to them, Of the representative domains from t
he domain sequence families, 49% are in the range of 51-150 residues, where
as 64% of the representative chains over 200 residues have more than 1 doma
in. Of the representative chains, 8.5% are part of multichain domains. The
largest multichain domain in the database has 14 chains and 1,400 residues,
whereas the largest single-chain domain has 907 residues. The largest numb
er of domains found in a protein is 13, The analysis shows that over the hi
story of the PDB, new domain folds have been discovered at a slower rate th
an by random selection of all known folds. Between 1992 and 1997, a constan
t 1 in 11 new domains deposited in the PDB has shown no sequence similarity
to a previously known domain sequence family, and only 1 in 15 new domain
structures has had a fold that has not been seen previously. A comparison o
f the September 1996 release of 3Dee to the Structural Classification of Pr
oteins (SCOP) showed that the domain definitions agreed for 80% of the repr
esentative protein chains. However, 3Dee provided explicit domain boundarie
s for more proteins, 3Dee is accessible on the World Wide Web at http://bar
ton.ebi.ac.uk/servers/3Dee.html. Proteins 2001; 42:332-344. (C) 2000 Wiley-
Liss, Inc.