Ma. Huynen et E. Vannimwegen, THE FREQUENCY-DISTRIBUTION OF GENE FAMILY SIZES IN COMPLETE GENOMES, Molecular biology and evolution, 15(5), 1998, pp. 583-589
We compare the frequency distribution of gene family sizes in the comp
lete genomes of six bacteria (Escherichia coli, Haemophilus influenzae
, Helicobacter pylori, Mycoplasma genitalium, Mycoplasma pneumoniae, a
nd Synechocystis sp. PCC6803), two Archaea (Methanococcus jannaschii a
nd Methanobacterium thermoautotrophicum), one eukaryote (Saccharomyces
cerevisiae), the vaccinia virus, and the bacteriophage T4. The sizes
of the gene families versus their frequencies show power-law distribut
ions that tend to become flatter (have a larger exponent) as the numbe
r of genes in the genome increases. Power-law distributions generally
occur as the limit distribution of a multiplicative stochastic process
with a boundary constraint. We discuss various models that can accoun
t for a multiplicative process determining the sizes of gene families
in the genome. In particular, we argue that, in order to explain the o
bserved distributions, gene families have to behave in a coherent fash
ion within the genome; i.e., the probabilities of duplications of gene
s within a gene family are not independent of each other. Likewise, th
e probabilities of deletions of genes within a gene family are not ind
ependent of each other.