Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in
having a small genome (130-140 Mb), excellent physical and genetic maps, a
nd little repetitive DNA. Here we report the sequence of chromosome 2 from
the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 meg
abases (Mb). The latter represents the longest published stretch of uninter
rupted DNA sequence assembled from any organism to date. Chromosome 2 repre
sents 15% of the genome and encodes 4,037 genes, 49% of which have no predi
cted function. Roughly 250 tandem gene duplications were found in addition
to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 a
nd 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2
Mb within the genetically defined centromere revealed a low density of reco
gnizable genes, and a high density and diverse range of vestigial and presu
mably inactive mobile elements. More unexpected is what appears to be a rec
ent insertion of a continuous stretch of 75% of the mitochondrial genome in
to chromosome 2.