End sequences from bacterial artificial chromosomes (BACs) provide highly s
pecific sequence markers in large-scale sequencing projects, To date, we ha
ve generated >300,000 end sequences from >186,000 human BAC clones with an
average read length of >460 bp for a total of 141 Mb covering similar to 4.
7% of the genome. Over 60% of the clones have BAC end sequences (BESs) from
both ends representing more than fivefold coverage of the human genome by
the paired-end clones. Our quality assessments and sequence analyses indica
te that BESs from human BAC libraries developed at The California Institute
of Technology (CalTech) and Roswell Park Cancer Institute have similar pro
perties. The analyses have highlighted differences in insert size for diffe
rent segments of the CalTech library. Problems with the fidelity of trackin
g of sequence data back to physical clones have been observed in some subse
ts of the overall BES dataset. The annotation results of BESs for the conte
nts of available genomic sequences, sequence tagged sites, expressed sequen
ce tags, protein encoding regions, and repeats indicate that this resource
will be valuable in many areas of genome research. (C) 2000 Academic Press.