Arabidopsis thaliana has emerged as a model system for studies of plant gen
etics and development, and its genome has been targeted for sequencing(1) b
y an international consortium (the Arabidopsis Genome Initiative; http://ge
nome-www. stanford.edu/Arabidopsis/agi.html). To support the genome-sequenc
ing effort, we fingerprinted more than 20,000 BACs (ref. 2) from two high-q
uality publicly available libraries(3-5), generating an estimated 17-fold r
edundant coverage of the genome, and used the fingerprints to nucleate asse
mbly of the data by computer. Subsequent manual revision of the assemblies
resulted in the incorporation of 19,661 fingerprinted BACs into 169 ordered
sets of overlapping clones ('contigs'), each containing at least 3 clones.
These contigs are ideal for parallel selection of BACs for large-scale seq
uencing and have supported the generation of more than 5.8 Mb of finished g
enome sequence submitted to GenBank; analysis of the sequence has confirmed
the integrity of contigs constructed using this fingerprint data. Placemen
t of contigs onto chromosomes can now be performed, and is being pursued by
groups involved in both sequencing and positional cloning studies. To our
knowledge, these data provide the first example of whole-genome random PAC
fingerprint analysis of a eucaryote, and have provided a model essential to
efforts aimed at generating similar databases of fingerprint contigs to su
pport sequencing of other complex genomes, including that of human.