ITA
ENG

ANALYSIS OF EST-DRIVEN GENE ANNOTATION IN HUMAN GENOMIC SEQUENCE

Authors

BAILEY LC SEARLS DB OVERTON GC

Citation

Lc. Bailey et al., ANALYSIS OF EST-DRIVEN GENE ANNOTATION IN HUMAN GENOMIC SEQUENCE, PCR methods and applications, 8(4), 1998, pp. 362-376

Citations number

Categorie Soggetti

Biothechnology & Applied Migrobiology",Biology,"Genetics & Heredity

Journal title

PCR methods and applications → ACNP

ISSN journal

10549803

Volume

Issue

Year of publication

1998

Pages

362 - 376

Database

ISI

SICI code

1054-9803(1998)8:4<362:AOEGAI>2.0.ZU;2-9

Abstract

We have performed a systematic analysis of gene identification in geno mic sequence by similarity search against expressed sequence tags (EST s) to assess the suitability of this method for automated annotation o f the human genome. A BLAST-based strategy was constructed to examine the potential of this approach, and was applied to test sets containin g all human genomic sequences longer than 5 kb in public databases, pl us 300 kb of exhaustively characterized benchmark sequence. At high st ringency, 70%-90% of all annotated genes are detected by near-identity to EST sequence; >95% of ESTs aligning with well-annotated sequences overlap a gene. These ESTs provide immediate access to the correspondi ng cDNA clones for follow-Lip laboratory verification and subsequent b iologic analysis. At lower stringency, up to 97% of annotated genes we re identified by similarity to ESTs. The apparent false-positive rate rose to 55% to ESTs among all sequences and 20% among benchmark sequen ces at the lowest stringency, indicating that many genes in public dat abase entries are unannotated. Approximately half of the alignments sp an multiple exons, and thus aid in the construction of gene prediction s and elucidation of alternative splicing. In addition, ESTs from mult iple cDNA libraries frequently cluster over genes, providing a startin g point For crude expression profiles. Clone IDs may be used to form E ST pairs, and particularly to extend models by associating alignments of lower stringency with high-quality alignments. These results demons trate that EST similarity search is a practical general-purpose annota tion technique that complements pattern recognition methods as a tool for gene characterization.