Aa. Camargo et al., The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome, P NAS US, 98(21), 2001, pp. 12103-12108
Citations number
20
Categorie Soggetti
Multidisciplinary
Journal title
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
open reading frame expressed sequences tags (ORESTES) differ from conventio
nal ESTs by providing sequence data from the central protein coding portion
of transcripts. We generated a total of 696,745 ORESTES sequences from 24
human tissues and used a subset of the data that correspond to a set of 15,
095 full-length mRNAs as a means of assessing the efficiency of the strateg
y and its potential contribution to the definition of the human transcripto
me. We estimate that ORESTES sampled over 80% of all highly and moderately
expressed, and between 40% and 50% of rarely expressed, human genes. In our
most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generate
d are derived from transcripts from an estimated 70% of all genes expressed
in that tissue, with an equally efficient representation of both highly an
d poorly expressed genes. In this respect, we find that the capacity of the
ORESTES strategy both for gene discovery and shotgun transcript sequence g
eneration significantly exceeds that of conventional ESTs. The distribution
of ORESTES is such that many human transcripts are now represented by a sc
affold of partial sequences distributed along the length of each gene produ
ct. The experimental joining of the scaffold components, by reverse transcr
iption-PCR, represents a direct route to transcript finishing that may repr
esent a useful alternative to full-length cDNA cloning.