A large-scale BAC end-sequencing project at The Institute for Genomic Resea
rch (TIGR) has generated one of the most extensive sets of sequence markers
for the mouse genome to date. With a sequencing success rate of > 80%, an
average read length of 485 bp, and ABI3700 capillary sequencers, we have ge
nerated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb to
tal from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15
x clone coverage, 7% sequence coverage, and a marker every 7 kb across the
genome. A total of 191,916 BACs have sequences from both ends providing 12x
genome coverage. The average Q20 length is 406 bp and 84% of the bases hav
e phred quality scores greater than or equal to 20. RPCI-24 mBESs have more
Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequ
encers and the sample tracking system ensure that > 95% of mBESs are associ
ated with the right clone identifiers. We have found that a significant fra
ction of mBESs contains LI repeats and similar to 48% of the clones have bo
th ends with greater than or equal to 100 bp contiguous unique Q20 bases. A
bout 3% mBESs match ESTs and > 70% of matches were conserved between the mo
use and the human or the rat. Approximately 0.1% mBESs contain STSs. About
0.2% mBESs match human finished sequences and > 70% of these sequences have
EST hits. The analyses indicate that our high-quality mouse BAC end sequen
ces will be a valuable resource to the community.