Tg. Wolfsberg et D. Landsman, A COMPARISON OF EXPRESSED SEQUENCE TAGS (ESTS) TO HUMAN GENOMIC SEQUENCES, Nucleic acids research, 25(8), 1997, pp. 1626-1632
The Expressed Sequence Tag (EST) division of GenBank, dbEST, is a larg
e repository of the data being generated by human genome sequencing ce
nters. ESTs are short, single pass cDNA sequences generated from rando
mly selected library clones. The similar to 415 000 human ESTs represe
nt a valuable, law priced, and easily accessible biological reagent. A
s many ESTs are derived from yet uncharacterized genes, dbEST is a pri
me starting point for the identification of novel mRNAs. Conversely ot
her genes are represented by hundreds of ESTs, a redundancy which may
provide data about rare mRNA isoforms. Here we present an analysis of
>1000 ESTs generated by the WashU-Merck EST project. These ESTs were c
ollected by querying dbEST with the genomic sequences of 15 human gene
s. When we aligned the matching ESTs to the genomic sequences, we foun
d that in one gene, 73% of the ESTs which derive from spliced or parti
ally spliced transcripts either contain intron sequences or are splice
d at previously unreported sites; other genes have lower percentages o
f such ESTs, and some have none. This finding suggests that ESTs could
provide researchers with novel information about alternative splicing
in certain genes. In a related analysis of pairs of ESTs which are re
ported to derive from a single gene, we found that as many as 26% of t
he pairs do not BOTH align with the sequence of the same gene. We susp
ect that some of these unusual ESTs result from artifacts in EST gener
ation, and caution researchers that they may find such clones while an
alyzing sequences in dbEST.