J. Bouck et al., ANALYSIS OF THE QUALITY AND UTILITY OF RANDOM SHOTGUN SEQUENCING AT LOW REDUNDANCIES, PCR methods and applications, 8(10), 1998, pp. 1074-1084
The currently favored approach for sequencing the human genome involve
s selecting representative large-insert clones (100-200 kb), randomly
shearing this DNA to construct shotgun libraries, and then sequencing
many different isolates from the library. This method, entitled direct
ed random shotgun sequencing, requires highly redundant sequencing to
obtain a complete and accurate finished consensus sequence. Recently i
t has been suggested that a rapidly generated lower redundancy sequenc
e might be of use to the scientific community. Low-redundancy sequenci
ng has been examined previously using simulated data sets. Here we uti
lize trace data from a number of projects submitted to GenBank to perf
orm reconstruction experiments that mimic low-redundancy sequencing. T
hese low-redundancy sequences have been examined for the completeness
and quality of the consensus product, information content, and usefuln
ess for interspecies comparisons. The data presented here suggest thre
e different sequencing strategies, each with different utilities. (1)
Nearly complete sequence data can be obtained by sequencing a random s
hotgun library at sixfold redundancy. This may therefore represent a g
ood point to switch from a random to directed approach. (2) Sequencing
can be performed with as little as twofold redundancy to find most of
the information about exons, EST hits, and putative exon similarity m
atches. (3) To obtain contiguity of coding regions, sequencing at thre
e- to fourfold redundancy would be appropriate. From these results, we
suggest that a useful intermediate product for genome sequencing migh
t be obtained by three- to fourfold redundancy. Such a product would a
llow a large amount of biologically useful data to be extracted while
postponing the majority of work involved in producing a high quality c
onsensus sequence.