The random (shotgun) DNA sequencing strategy is used for most large-sc
ale sequencing projects, including the identification of human disease
genes after positional cloning. The principle of the method-sequence
assembly from overlap-requires the candidate gene region to be partiti
oned into 15- to 20-kb pieces (usually lambda inserts), themselves ran
domly subcloned into M13 prior to sequencing with a 6- to 8-fold redun
dancy. Most often, a time-consuming directed strategy must be invoked
to close the remaining gaps. Ultimately, computer-based methods are in
voked to locate putative coding exons within the finished genomic sequ
ence. Given the small average size of vertebrate exons, I show here th
at they can be detected from the computer analysis of the individual r
uns, much before completion of contiguity. However, the successful ass
essment of coding potential from the raw data depends on a combination
of new sequence masking techniques. When the identification of coding
exons is the primary goal, the usual random sequencing strategy can t
hus be greatly optimized. The streamlined approach requires only a 2-
to 2.5-fold sequencing re dundancy, can dispense with the subcloning i
n lambda and the closure of gaps, and can be fully automated. The feas
ibility of this strategy is demonstrated using data from the X-linked
Kallmann syndrome gene region. (C) 1994 Academic Press, Inc.