Gene duplication is an important mechanistic antecedent to the evolution of
new genes and novel biochemical functions. In an attempt to assess the con
tribution of gene duplication to genome evolution in archaea and bacteria,
clusters of related genes that appear to have expanded subsequent to the di
versification of the major prokaryotic lineages (lineage-specific expansion
s) were analyzed. Analysis of 21 completely sequenced prokaryotic genomes s
hows that lineage-specific expansions comprise a substantial fraction (simi
lar to5%-33%) of their coding capacities. A positive correlation exists bet
ween the fraction of the genes taken up by lineage-specific expansions and
the total number of genes in a genome. Consistent with the notion that line
age-specific expansions are made up of relatively recently duplicated genes
, >90% of the detected clusters consists of only two to four genes. The mor
e common smaller clusters tend to include genes with higher pairwise simila
rity (as reflected by average score density) than larger clusters. Regardle
ss of size, cluster members tend to be located more closely on bacterial ch
romosomes than expected by chance, which could reflect a history of tandem
gene duplication. In addition to the small clusters, almost all genomes als
o contain rare large clusters of size greater than or equal to 20. Several
examples of the potential adaptive significance of these large clusters are
explored. The presence or absence of clusters and their related genes was
used as the basis for the construction of a similarity graph for completely
sequenced prokaryotic genomes. The topology of the resulting graph seems t
o reflect a combined effect of common ancestry, horizontal transfer,and lin
eage-specific gene loss.