Elucidation of the structure of DNA by Watson and Crick [Nature 171 (1
953) 737-738] has led to many crucial molecular experiments, including
studies on DNA replication, transcription, physical mapping, and most
recently to serious attempts directed toward the sequencing of large
genomes [Watson, Science 248 (1990) 44-49]. I am totally convinced of
the great importance of the Human Genome Project, and toward achieving
this goal I strongly favor `top-down' approaches consisting of the ph
ysical mapping and preparation of contiguous 50-100-kb fragments direc
tly from the genome, followed by their automated sequencing based on t
he rapid assembly of primers by hexamer ligation together with primer
walking. Our `top-down' procedure to;ally avoids conventional cloning,
subcloning and random sequencing, which are the elements of the prese
nt `bottom-up' procedures. Fragments of 50-100 kb are prepared in suff
icient quantities either by in vitro excision with rare-cutting restri
ction systems (including Achilles' heel cleavage [AC] or the RecA-AC p
rocedures of Koob et al. [Nucleic Acids Res. 20 (1992) 5831-5836]) or
by in vivo excision and amplification using the yeast FRT/Flp system o
r the phage lambda att/Int system. Such fragments, when derived direct
ly from the Escherichia coli genome, are arranged in consecutive order
, so that 50 specially constructed strains of E. coli would supply 50
end-to-end arranged approx. 100-kb fragments, which will cover the ent
ire approx. 5-Mb E. coli genome. For the 150-Mb Drosophila melanogaste
r genome, 1500 of such consecutive 100-kb fragments (supplied by 1500
strains) are required to cover the entire genome. The fragments will b
e sequenced by the SPEL-6 method involving hexamer ligation [Szybalski
, Gene 90 (1990) 177-178; Fresenius J. Anal. Chem. 4 (1992) 343] and p
rimer walking. The 18-mer primers are synthesized in only a few minute
s from three contiguous hexamers annealed to the DNA strand to be sequ
enced when using an over 100-fold excess of hexamers and T4 DNA ligase
at room temperature, preferably in the presence of the single-strand-
binding (SSB) protein of E. coli. These 18-nt primers are immediately
extended by the DNA polymerase, Sequenase 2.0, in the dideoxy sequenci
ng reaction. Very high quality sequencing ladders are obtained for sin
gle-stranded DNA or denatured double-stranded approx. 50-kb fragments,
as exemplified by phage lambda DNA. When automated and used in conjun
ction with fluorescent dyes and ultrathin gels, the method should perm
it the sequencing of 500 nucleotides (nt) per 30 min, i.e., 1 kb/h and
100 kb in less than a week per one sequencing channel. Automation has
to include direct gel readout of over 500 nt, analysis of the termina
l 50 nt, computerized selection and robotic assembly of 18-mers from t
hree hexamers followed by their template-dependent ligation, sequencin
g reactions, instantaneous deproteinization, gel loading, electrophore
sis, and again a gel readout followed by the next cycle. With 50 chann
els and all approx. 100-kb genomic fragments available (see above), on
e could project that automated sequencing of the entire E. coli genome
should take about one week. Sequencing of both strands and larger gen
omes would require proportionally more time or more of the automated s
equencing machines. There is little doubt in my mind that automated `t
op-down' approaches are the key to the efficient and rapid sequencing
of large genomes.