THE FREQUENCY-DISTRIBUTION OF GENE FAMILY SIZES IN COMPLETE GENOMES

Citation
Ma. Huynen et E. Vannimwegen, THE FREQUENCY-DISTRIBUTION OF GENE FAMILY SIZES IN COMPLETE GENOMES, Molecular biology and evolution, 15(5), 1998, pp. 583-589
Citations number
24
Categorie Soggetti
Biology Miscellaneous",Biology,"Genetics & Heredity
ISSN journal
07374038
Volume
15
Issue
5
Year of publication
1998
Pages
583 - 589
Database
ISI
SICI code
0737-4038(1998)15:5<583:TFOGFS>2.0.ZU;2-A
Abstract
We compare the frequency distribution of gene family sizes in the comp lete genomes of six bacteria (Escherichia coli, Haemophilus influenzae , Helicobacter pylori, Mycoplasma genitalium, Mycoplasma pneumoniae, a nd Synechocystis sp. PCC6803), two Archaea (Methanococcus jannaschii a nd Methanobacterium thermoautotrophicum), one eukaryote (Saccharomyces cerevisiae), the vaccinia virus, and the bacteriophage T4. The sizes of the gene families versus their frequencies show power-law distribut ions that tend to become flatter (have a larger exponent) as the numbe r of genes in the genome increases. Power-law distributions generally occur as the limit distribution of a multiplicative stochastic process with a boundary constraint. We discuss various models that can accoun t for a multiplicative process determining the sizes of gene families in the genome. In particular, we argue that, in order to explain the o bserved distributions, gene families have to behave in a coherent fash ion within the genome; i.e., the probabilities of duplications of gene s within a gene family are not independent of each other. Likewise, th e probabilities of deletions of genes within a gene family are not ind ependent of each other.