Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals

Citation
A. Vanet et al., Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals, J MOL BIOL, 297(2), 2000, pp. 335-353
Citations number
45
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
297
Issue
2
Year of publication
2000
Pages
335 - 353
Database
ISI
SICI code
0022-2836(20000324)297:2<335:IREFAW>2.0.ZU;2-7
Abstract
Helicobacter pylori is adapted to life in a unique niche, the gastric epith elium of primates. Its promoters may therefore be different from those of o ther bacteria. Here, we determine motifs possibly involved in the recogniti on of such promoter sequences by the RNA polymerase using a new motif ident ification method. An important feature of this method is that the motifs ar e sought with the least possible assumptions about what they may look like. The method starts by considering the whole genome of H. pylori and attempt s to infer directly from it a description for a family of promoters. Thus, this approach differs from searching for such promoters with a previously e stablished description. The two algorithms are based on the idea of inferri ng motifs by flexibly comparing words in the sequences with an external obj ect, instead of between themselves. The first algorithm infers single motif s, the second a combination of two motifs separated from one another by str ictly defined, sterically constrained distances. Besides independently find ing motifs known to be present in other bacteria, such as the Shine-Dalgarn o sequence and the TATA-box, this approach suggests the existence in H. pyl ori of a new, combined motif, TTAAGC, followed optimally 21 bp downstream b y TATAAT. Between these two motifs, there is in some cases another, TTTTAA or, less frequently, a repetition of TTAAGC separated optimally from the TA TA-box by 12 bp. The combined motif TTAAGC x (21 +/- 2)TATAAT is present wi th no errors immediately upstream from the only two copies of the ribosomal 23 S-5 S RNA genes in H. pylori, and with one error upstream from the only two copies of the ribosomal 16 S RNA genes. The operons of both ribosomal RNA molecules are strongly expressed, representing an encouraging sign of t he pertinence of the motifs found by the algorithms. In 25 cases out of a p ossible 30, the combined motif is found with no more than three substitutio ns immediately upstream from ribosomal proteins, or operons containing a ri bosomal protein. This is roughly the same frequency of occurrence as for TT GACA x (15-19)TATAAT (with the same maximum number of substitutions allowed ) described as being the sigma(70) promoter sequence consensus in Bacillus subtilis and Escherichia coli. The frequency of occurrence of the new motif obtained, TTAAGC x (19-23)TATAAT, remains high when all protein genes in H . pylori are considered, as is the case for the TTGACA x (15-19)TATAAT moti f in B. subtilis but not in E. coli. (C) 2000 Academic Press.