ANALYSIS OF EST-DRIVEN GENE ANNOTATION IN HUMAN GENOMIC SEQUENCE

Citation
Lc. Bailey et al., ANALYSIS OF EST-DRIVEN GENE ANNOTATION IN HUMAN GENOMIC SEQUENCE, PCR methods and applications, 8(4), 1998, pp. 362-376
Citations number
40
Categorie Soggetti
Biothechnology & Applied Migrobiology",Biology,"Genetics & Heredity
ISSN journal
10549803
Volume
8
Issue
4
Year of publication
1998
Pages
362 - 376
Database
ISI
SICI code
1054-9803(1998)8:4<362:AOEGAI>2.0.ZU;2-9
Abstract
We have performed a systematic analysis of gene identification in geno mic sequence by similarity search against expressed sequence tags (EST s) to assess the suitability of this method for automated annotation o f the human genome. A BLAST-based strategy was constructed to examine the potential of this approach, and was applied to test sets containin g all human genomic sequences longer than 5 kb in public databases, pl us 300 kb of exhaustively characterized benchmark sequence. At high st ringency, 70%-90% of all annotated genes are detected by near-identity to EST sequence; >95% of ESTs aligning with well-annotated sequences overlap a gene. These ESTs provide immediate access to the correspondi ng cDNA clones for follow-Lip laboratory verification and subsequent b iologic analysis. At lower stringency, up to 97% of annotated genes we re identified by similarity to ESTs. The apparent false-positive rate rose to 55% to ESTs among all sequences and 20% among benchmark sequen ces at the lowest stringency, indicating that many genes in public dat abase entries are unannotated. Approximately half of the alignments sp an multiple exons, and thus aid in the construction of gene prediction s and elucidation of alternative splicing. In addition, ESTs from mult iple cDNA libraries frequently cluster over genes, providing a startin g point For crude expression profiles. Clone IDs may be used to form E ST pairs, and particularly to extend models by associating alignments of lower stringency with high-quality alignments. These results demons trate that EST similarity search is a practical general-purpose annota tion technique that complements pattern recognition methods as a tool for gene characterization.