Global surveys of genomes. measure the usage of essential molecular parts,
defined here as protein families, superfamilies or folds, in different orga
nisms. Based on surveys of the first 20 completely sequenced genomes, we ob
serve that the occurrence of these parts follows a power-law distribution.
That is, the number of distinct parts (F) with a given genomic occurrence (
V) decays as F = aV(-b), with a few parts occurring many times and most occ
urring infrequently. For a given organism, the distributions of families, s
uperfamilies and folds are nearly identical, and this is reflected in the s
ize of the decay exponent b. Moreover, the exponent varies between differen
t organisms, with those of smaller genomes displaying a steeper decay (i.e.
larger b). Clearly, the power law indicates a preference to duplicate gene
s that encode for molecular parts which are already common. Here, we presen
t a minimal, but biologically meaningful model that accurately describes th
e observed power law. Although the model performs equally well for all thre
e protein classes, we focus on the occurrence of folds in preference to fam
ilies and superfamilies. This is because folds are comparatively insensitiv
e to the effects of point mutations that can cause a family member to diver
ge beyond detectable similarity. In the model, genomes evolve through two b
asic operations: (i) duplication of existing genes; (ii) net flow of new ge
nes. The flow term is closely related to the exponent b and can accommodate
considerable gene loss; however, we demonstrate that the observed data is
reproduced best with a net inflow, i.e. with more gene gain than loss. More
over, we show that prokaryotes have much higher rates of gene acquisition t
han eukaryotes, probably reflecting lateral transfer. A further natural out
come from our model is an estimation of the fold composition of the initial
genome, which potentially relates to the common ancestor for modem organis
ms. Supplementary material pertaining to this work is available from www.pa
rtslist.org/powerlaw. (C) 2001 Academic Press.