Computational analysis of full-length mouse cDNAs compared with human genome sequences

Citation
S. Kondo et al., Computational analysis of full-length mouse cDNAs compared with human genome sequences, MAMM GENOME, 12(9), 2001, pp. 673-677
Citations number
10
Categorie Soggetti
Molecular Biology & Genetics
Journal title
MAMMALIAN GENOME
ISSN journal
09388990 → ACNP
Volume
12
Issue
9
Year of publication
2001
Pages
673 - 677
Database
ISI
SICI code
0938-8990(200109)12:9<673:CAOFMC>2.0.ZU;2-2
Abstract
Although the sequencing of the human genome is complete, identification of encoded genes and determination of their structures remain a major challeng e. In this report, we introduce a method that effectively uses full-length mouse cDNAs to complement efforts in carrying out these difficult tasks. A total of 61,227 RIKEN mouse cDNAs (21,076 full-length and 40,151 EST sequen ces containing certain redundancies) were aligned with the draft human sequ ences. We found 35,141 non-redundant genomic regions that showed a signific ant alignment with the mouse cDNAs. We analyzed the structures and composit ional properties of the regions detected by the full-length cDNAs, includin g cross-species comparisons, and noted a systematic bias of GENSCAN against exons of small size and/or low GC-content. Of the cDNAs locating the 35,14 1 genomic regions, 3,217 did not match any sequences of the known human gen es or ESTs. Among those 3,217 cDNAs, 1,141 did not show any significant sim ilarity to any protein sequence in the GenBank non-redundant protein databa se and thus are candidates for novel genes.