The severity of Helicobacter pylori-related disease is correlated with a pa
thogenicity island (the Cag region of about 26 genes) whose presence is ass
ociated with the up-regulation of an IL-8 cytokine inflammatory response in
gastric epithelial cells. Statistical analysis of the Cag gene sequences c
alculated from the complete genome of strain 26695 revealed several unusual
features. The Cag7 sequence (1,927 aa) has two repeat regions, Repeat regi
on I runs 317 aa in a form of AAA proximal to the protein N terminal; repea
t region II extends 907 aa in the middle of the protein sequence consisting
of 74 contiguous segments composed from selections among six consensus seq
uences and includes is regularly distributed cysteine residues with consecu
tive cysteines mostly 12, 18, or 24 aa apart, This "regular" cysteine arran
gement may provide a scaffolding of linker elements stabilized by disulfide
bridges. When Cag7 homologues from different strains are compared, differe
nces were found almost exclusively in the repeat regions, resulting from de
letion and/or insertion of repeating units. These observations suggest that
the anomalous repetitive structure of the sequence plays an important role
in the conformation of Cag7 gene product and potentially in the function o
f the pathogenicity island. Other facets of the Cag7 sequence show signific
ant charge clusters, high multiplet count, and extremes of amino acid usage
.