The sequence-tagged connector (STC) strategy proposes to generate sequence
tags densely scattered (every 3.3 kilobases) across the human genome by arr
aying 450,000 bacterial artificial chromosomes (BACs) with randomly cleaved
inserts, sequencing both ends of each, and preparing a restriction enzyme
fingerprint of each. The STC resource, containing end sequences, fingerprin
ts, and arrayed BACs, creates a map where the interrelationships of the ind
ividual BAC clones are resolved through their STCs as overlapping BAC clone
s are sequenced. Once a seed or initiation BAC clone is sequenced, the mini
mum overlapping 5' and 3' BAC clones can be identified computationally and
sequenced, By reiterating this "sequence-then-map by computer analysis agai
nst the STC database" strategy, a minimum tiling path of clones can be sequ
enced at a rate that is primarily limited by the sequencing throughput of i
ndividual genome centers. As of February 1999, we had deposited, together w
ith The Institute for Genomic Research (TIGR), into GenBank 314,000 STCs (a
pproximate to 135 megabases), or 4.5% of human genomic DNA. This genome sur
vey reveals numerous genes, genome-wide repeats, simple sequence repeats (p
otential genetic markers), and CpG islands (potential gene initiation sites
). It also illustrates the power of the STC strategy for creating minimum t
iling paths of BAC clones for large-scale genomic sequencing. Because the S
TC resource permits the easy integration of genetic, physical, gene, and se
quence maps for chromosomes, it will be a powerful tool for the initial ana
lysis of the human genome and other complex genomes.