FROM THE DOUBLE-HELIX TO NOVEL APPROACHES TO THE SEQUENCING OF LARGE GENOMES

Authors
Citation
W. Szybalski, FROM THE DOUBLE-HELIX TO NOVEL APPROACHES TO THE SEQUENCING OF LARGE GENOMES, Gene, 135(1-2), 1993, pp. 279-290
Citations number
40
Categorie Soggetti
Genetics & Heredity
Journal title
GeneACNP
ISSN journal
03781119
Volume
135
Issue
1-2
Year of publication
1993
Pages
279 - 290
Database
ISI
SICI code
0378-1119(1993)135:1-2<279:FTDTNA>2.0.ZU;2-T
Abstract
Elucidation of the structure of DNA by Watson and Crick [Nature 171 (1 953) 737-738] has led to many crucial molecular experiments, including studies on DNA replication, transcription, physical mapping, and most recently to serious attempts directed toward the sequencing of large genomes [Watson, Science 248 (1990) 44-49]. I am totally convinced of the great importance of the Human Genome Project, and toward achieving this goal I strongly favor `top-down' approaches consisting of the ph ysical mapping and preparation of contiguous 50-100-kb fragments direc tly from the genome, followed by their automated sequencing based on t he rapid assembly of primers by hexamer ligation together with primer walking. Our `top-down' procedure to;ally avoids conventional cloning, subcloning and random sequencing, which are the elements of the prese nt `bottom-up' procedures. Fragments of 50-100 kb are prepared in suff icient quantities either by in vitro excision with rare-cutting restri ction systems (including Achilles' heel cleavage [AC] or the RecA-AC p rocedures of Koob et al. [Nucleic Acids Res. 20 (1992) 5831-5836]) or by in vivo excision and amplification using the yeast FRT/Flp system o r the phage lambda att/Int system. Such fragments, when derived direct ly from the Escherichia coli genome, are arranged in consecutive order , so that 50 specially constructed strains of E. coli would supply 50 end-to-end arranged approx. 100-kb fragments, which will cover the ent ire approx. 5-Mb E. coli genome. For the 150-Mb Drosophila melanogaste r genome, 1500 of such consecutive 100-kb fragments (supplied by 1500 strains) are required to cover the entire genome. The fragments will b e sequenced by the SPEL-6 method involving hexamer ligation [Szybalski , Gene 90 (1990) 177-178; Fresenius J. Anal. Chem. 4 (1992) 343] and p rimer walking. The 18-mer primers are synthesized in only a few minute s from three contiguous hexamers annealed to the DNA strand to be sequ enced when using an over 100-fold excess of hexamers and T4 DNA ligase at room temperature, preferably in the presence of the single-strand- binding (SSB) protein of E. coli. These 18-nt primers are immediately extended by the DNA polymerase, Sequenase 2.0, in the dideoxy sequenci ng reaction. Very high quality sequencing ladders are obtained for sin gle-stranded DNA or denatured double-stranded approx. 50-kb fragments, as exemplified by phage lambda DNA. When automated and used in conjun ction with fluorescent dyes and ultrathin gels, the method should perm it the sequencing of 500 nucleotides (nt) per 30 min, i.e., 1 kb/h and 100 kb in less than a week per one sequencing channel. Automation has to include direct gel readout of over 500 nt, analysis of the termina l 50 nt, computerized selection and robotic assembly of 18-mers from t hree hexamers followed by their template-dependent ligation, sequencin g reactions, instantaneous deproteinization, gel loading, electrophore sis, and again a gel readout followed by the next cycle. With 50 chann els and all approx. 100-kb genomic fragments available (see above), on e could project that automated sequencing of the entire E. coli genome should take about one week. Sequencing of both strands and larger gen omes would require proportionally more time or more of the automated s equencing machines. There is little doubt in my mind that automated `t op-down' approaches are the key to the efficient and rapid sequencing of large genomes.