COMPARATIVE-ANALYSIS OF 1196 ORTHOLOGOUS MOUSE AND HUMAN FULL-LENGTH MESSENGER-RNA AND PROTEIN SEQUENCES

Citation
W. Makalowski et al., COMPARATIVE-ANALYSIS OF 1196 ORTHOLOGOUS MOUSE AND HUMAN FULL-LENGTH MESSENGER-RNA AND PROTEIN SEQUENCES, PCR methods and applications, 6(9), 1996, pp. 846-857
Citations number
45
Categorie Soggetti
Biothechnology & Applied Migrobiology",Biology
ISSN journal
10549803
Volume
6
Issue
9
Year of publication
1996
Pages
846 - 857
Database
ISI
SICI code
1054-9803(1996)6:9<846:CO1OMA>2.0.ZU;2-4
Abstract
A large set of mRNA and encoded protein sequences, from orthologous mu rine and human genes, was compiled to analyze statistical, biological, and evolutionary properties of coding and noncoding transcribed seque nces. Protein sequence conservation varied between 36% and 100% identi ty, with an average value of 85%. The average degree of nucleotide seq uence identity for the corresponding coding sequences was also similar to 85%, whereas 5' and 3' untranslated regions (UTRs) were less conse rved, with aligned identities of 67% and 69%, respectively. For some m ouse and human genes, nucleotide sequences are more highly conserved t han the encoded protein sequences. A subset of 32 sequences, consistin g of only mouse/human protein pairs for which the human sequence repre sents a positionally cloned disease gene, had properties very similar to the larger data set, suggesting that our data are representative of the genome as a whole. With respect to sequence conservation, two int eresting outliers are the breast cancel (BRCA1) gene product and the t estis-determining factor (SRY), both of which display among the lowest degrees of sequence identity. The occurrence of both introns and repe titive elements (e.g., Alu, B1) in 5' and 3' UTRs was also studied. Th ese results provide one benchmark for the ''comparative genomics'' of mice and humans, with practical implications for the cross-referencing , of transcript maps. Also, they should prove useful in estimating the additional sampling diversity provided by mouse EST sequencing projec ts designed to complement the existing human cDNA collection.