ANALYSIS OF THE QUALITY AND UTILITY OF RANDOM SHOTGUN SEQUENCING AT LOW REDUNDANCIES

Citation
J. Bouck et al., ANALYSIS OF THE QUALITY AND UTILITY OF RANDOM SHOTGUN SEQUENCING AT LOW REDUNDANCIES, PCR methods and applications, 8(10), 1998, pp. 1074-1084
Citations number
23
Categorie Soggetti
Biothechnology & Applied Migrobiology",Biology,"Genetics & Heredity
ISSN journal
10549803
Volume
8
Issue
10
Year of publication
1998
Pages
1074 - 1084
Database
ISI
SICI code
1054-9803(1998)8:10<1074:AOTQAU>2.0.ZU;2-Z
Abstract
The currently favored approach for sequencing the human genome involve s selecting representative large-insert clones (100-200 kb), randomly shearing this DNA to construct shotgun libraries, and then sequencing many different isolates from the library. This method, entitled direct ed random shotgun sequencing, requires highly redundant sequencing to obtain a complete and accurate finished consensus sequence. Recently i t has been suggested that a rapidly generated lower redundancy sequenc e might be of use to the scientific community. Low-redundancy sequenci ng has been examined previously using simulated data sets. Here we uti lize trace data from a number of projects submitted to GenBank to perf orm reconstruction experiments that mimic low-redundancy sequencing. T hese low-redundancy sequences have been examined for the completeness and quality of the consensus product, information content, and usefuln ess for interspecies comparisons. The data presented here suggest thre e different sequencing strategies, each with different utilities. (1) Nearly complete sequence data can be obtained by sequencing a random s hotgun library at sixfold redundancy. This may therefore represent a g ood point to switch from a random to directed approach. (2) Sequencing can be performed with as little as twofold redundancy to find most of the information about exons, EST hits, and putative exon similarity m atches. (3) To obtain contiguity of coding regions, sequencing at thre e- to fourfold redundancy would be appropriate. From these results, we suggest that a useful intermediate product for genome sequencing migh t be obtained by three- to fourfold redundancy. Such a product would a llow a large amount of biologically useful data to be extracted while postponing the majority of work involved in producing a high quality c onsensus sequence.