W. Makalowski et al., COMPARATIVE-ANALYSIS OF 1196 ORTHOLOGOUS MOUSE AND HUMAN FULL-LENGTH MESSENGER-RNA AND PROTEIN SEQUENCES, PCR methods and applications, 6(9), 1996, pp. 846-857
A large set of mRNA and encoded protein sequences, from orthologous mu
rine and human genes, was compiled to analyze statistical, biological,
and evolutionary properties of coding and noncoding transcribed seque
nces. Protein sequence conservation varied between 36% and 100% identi
ty, with an average value of 85%. The average degree of nucleotide seq
uence identity for the corresponding coding sequences was also similar
to 85%, whereas 5' and 3' untranslated regions (UTRs) were less conse
rved, with aligned identities of 67% and 69%, respectively. For some m
ouse and human genes, nucleotide sequences are more highly conserved t
han the encoded protein sequences. A subset of 32 sequences, consistin
g of only mouse/human protein pairs for which the human sequence repre
sents a positionally cloned disease gene, had properties very similar
to the larger data set, suggesting that our data are representative of
the genome as a whole. With respect to sequence conservation, two int
eresting outliers are the breast cancel (BRCA1) gene product and the t
estis-determining factor (SRY), both of which display among the lowest
degrees of sequence identity. The occurrence of both introns and repe
titive elements (e.g., Alu, B1) in 5' and 3' UTRs was also studied. Th
ese results provide one benchmark for the ''comparative genomics'' of
mice and humans, with practical implications for the cross-referencing
, of transcript maps. Also, they should prove useful in estimating the
additional sampling diversity provided by mouse EST sequencing projec
ts designed to complement the existing human cDNA collection.