Theoretical considerations predict that amplification of expressed gene tra
nscripts by reverse transcription-PCR using arbitrarily chosen primers will
result in the preferential amplification of the central portion of the tra
nscript. systematic, high-throughput sequencing of such products would resu
lt in an expressed sequence tag (EST) database consisting of central, gener
ally coding regions of expressed genes. Such a database would add significa
nt value to existing public EST databases, which consist mostly of sequence
s derived from the extremities of cDNAs, and facilitate the construction of
contigs of transcript sequences. We tested our predictions, creating a dat
abase of 10,000 sequences from human breast tumors. The data confirmed the
central distribution of the sequences, the significant normalization of the
sequence population, the frequent extension of contigs composed of existin
g human ESTs, and the identification of a series of potentially important h
omologues of known genes. This approach should make a significant contribut
ion to the early identification of important human genes, the deciphering o
f the draft human genome sequence currently being compiled, and the shotgun
sequencing of the human transcriptome.