Recognition of transcription regulation sites (operators) is a hard problem
in computational molecular biology. In mast cases, small sample site and l
ow degree of sequence conservation preclude the construction of reliable re
cognition rules. We suggest an approach to this problem based on simultaneo
us analysis of several related genomes. It appears that as long as a gene c
oding for a transcription regulator is conserved in the compared bacterial
genomes, the regulation of the respective group of genes (regulons) also te
nds to be maintained. Thus a gene can be confidently predicted to belong to
a particular regulon in case not only itself, but also its orthologs in ot
her genomes have candidate operators in the regulatory regions. This provid
es for a greater sensitivity of operator identification as even relatively
weak signals are likely to be functionally relevant when conserved. We use
this approach to analyze the purine (PurR), arginine (ArgR) and aromatic am
ino acid (TrpR and TyrR) regulons of Escherichia coli and Haemophilus influ
enzae. Candidate binding sites in regulatory regions of the respective H,in
fluenzae genes are identified, a new family of purine transport proteins pr
edicted to belong to the PurR regulon is described, and probable regulation
of arginine transport by ArgR is demonstrated. Differences in the regulati
on of some orthologous genes in E,coli and H.influenzae, in particular the
apparent lack of the autoregulation of the purine repressor gene in H.influ
enzae, are demonstrated.