Cm. Matthews et al., VARIABLE SUBSTITUTION RATES OF THE 18 DOMAIN SEQUENCES IN ARTEMIA HEMOGLOBIN, Journal of molecular evolution, 46(6), 1998, pp. 729-733
The Artemia hemoglobin is a dimer comprising two nine-domain covalent
polymers in quaternary association. Each polymer is encoded by a gene
representing nine successive globin domains which have different seque
nces and are presumed to have been copied originally from a single-dom
ain gene. Two different polymers exist as the result of a complete dup
lication of the nine-domain gene, allowing the formation of either hom
odimers or the heterodimer. The total population size of 18 domains co
mprising nine corresponding pairs, coupled with the probability that t
hey reflect several hundred million years of evolution in the same lin
eage, provides a unique model in which the process of gene multiplicat
ion can be analyzed. The outcome has important implications for the re
liability of local molecular clocks. The two polymers differ from each
other at 11.7% of amino acid sites; however when corresponding indivi
dual domains are compared between polymers, amino acid substitution fl
uctuates by a factor of 2.7-fold from lowest to highest. This variatio
n is not obvious at the DNA level: Domain pair identity values fluctua
te by 1.3-fold. Identity values are, however, uncorrected for multiple
substitutions, and both silent and nonsilent changes are pooled. Ther
efore, to determine the variability in relative substitution rates at
the DNA level, we have used the method of Li (1993, J Mol Evol 36:96-9
9) to determine estimates of nonsynonymous (K-A) and synonymous (K-S)
substitutions per site for the nine pairs of domains. As expected, the
overall level of silent substitutions (K-S of 56.9%) far exceeded non
silent substitutions (K-A of 6.7%); however, for corresponding domain
pairs, K-A fluctuates by 2.3-fold and K-S by 1.7-fold. The large discr
epancies reflected in the expressed protein have accrued within a sing
le lineage and the implication is that divergence dates of different g
enera based on amino acid sequences, even with well-studied proteins o
f reasonable size, can be wrong by a factor well in excess of 2.