We describe an integrated system for the analysis of DNA sequence moti
fs within complete bacterial genome sequences. This system is based ar
ound ACeDB, a genome database with an integrated graphical user interf
ace; we identify and display motifs in the context of genetic, sequenc
e and bibliographic data, Tomb et al, (1997) previously reported the i
dentification of contingency genes in Helicobacter pylori through thei
r association with homopolymeric tracts and dinucleotide repeats. With
this as a starting point, we validated the system by a search for thi
s type of repeat and used the contextual information to assess the lik
elihood that they mediate phase variation in the associated open readi
ng frames (ORFs), We found all of the repeats previously described, an
d identified 27 putative phase-variable genes (including 17 previously
described), These could be divided into three groups: lipopolysacchar
ide (LPS) biosynthesis, cell-surface-associated proteins and DNA restr
iction/modification systems, Five of the putative genes did not have o
bvious homologues in any of the public domain sequence databases, The
reading frame of some ORFs was disrupted by the presence of the repeat
s, including the alpha(1-2) fucosyltransferase gene, necessary for the
synthesis of the Lewis Y epitope. An additional benefit of this appro
ach is that the results of each search can be analysed further and com
pared with those from other genomes, This revealed that H. pylori has
an unusually high frequency of homopurine:homopyrimidine repeats sugge
sting mechanistic biases that favour their presence and instability.