There is a limited repertoire of domain families that are duplicated and co
mbined in different ways to form the set of proteins in a genome. Proteins
are gene products, and at the level of genes, duplication, recombination, f
usion and fission are the processes that produce new genes. We attempt to g
ain an overview of these processes by studying the evolutionary units in pr
oteins, domains, in the protein sequences of 40 genomes. The domain and sup
erfamily definitions in the Structural Classification of Proteins Database
are used, so that we can view all pairs of adjacent domains in genome seque
nces in terms of their superfamily combinations. We find 783 out of the 859
superfamilies in SCOP in these genomes, and the 783 families occur in 1307
pairwise combinations. Most families are observed in combination with one
or two other families, while a few families are very versatile in their com
binatorial behaviour, 209 families do not make combinations with other fami
lies. This type of pattern can be described as a scale-free network. We als
o study the N to C-terminal orientation of domain pairs and domain repeats.
The phylogenetic distribution of domain combinations is surveyed, to estab
lish the extent of common and kingdom-specific combinations. Of the kingdom
-specific combinations, significantly more combinations consist of families
present in all three kingdoms than of families present in one or two kingd
oms. Hence, we are led to conclude that recombination between common famili
es, as compared to the invention of new families and recombination among th
ese, has also been a major contribution to the evolution of kingdom-specifi
c and species-specific functions in organisms in all three kingdoms. Finall
y, we compare the set of the domain combinations in the genomes to those in
the RCSB Protein Data Bank, and discuss the implications for structural ge
nomics. (C) 2001 Academic Press.