J. Jiang et Hj. Jacob, EBEST - AN AUTOMATED TOOL USING EXPRESSED SEQUENCE TAGS TO DELINEATE GENE STRUCTURE, PCR methods and applications, 8(3), 1998, pp. 268-275
Large numbers of expressed sequence tags (ESTs) continue to fill publi
c and private databases with partial cDNA sequences. However, using th
is huge amount of ESTs to facilitate gene finding in genomic sequence
imposes a challenge, especially to wet-lab scientists who often have l
imited computing resources. In an effort to consolidate the informatio
n hidden in the vast number of ESTs into a readable and manageable for
mat, we have developed EbEST-a program that automates the process of u
sing ESTs to help delineate gene structure in long stretches of genomi
c sequence. The EbEST program consists of three Functional modules-the
First module separates homologous ESTs into clusters and identifies t
he most informative ESTs within each cluster; the second module uses t
he informative ESTs to perform gapped alignment and to predict tile ex
on-intron boundary; and the third module generates text file and graph
ic outputs that illustrate the orientation, exonic structure, and untr
anslated regions [UTRs] of putative genes in the genomic sequence bein
g analyzed. Evaluation of EbEST with 176 human genes from the ALLSEQ s
et indicated that it performed in-line with several existing gene find
ing programs, but was more tolerant to sequencing errors. Furthermore,
when EbEST was challenged with query sequences that harbor more than
one gene, it suffered only a slight drop in performance, whereas the p
erformance of the other programs evaluated decreased more. EbEST may b
e used as a stand-alone tool to annotate human genomic sequences with
EST-derived gene elements, or can be used in conjunction with computat
ional gene-recognition programs to increase the accuracy of gene predi
ction.