Identifying the complete transcriptional regulatory network for an organism
is a major challenge. For each regulatory protein, we want to know all the
genes it regulates, that is, its regulon. Examples of known binding sites
can be used to estimate the binding specificity of the protein and to predi
ct other binding sites. However, binding site predictions can be unreliable
because determining the true specificity of the protein is difficult becau
se of the considerable variability of binding sites. Because regulatory sys
tems tend to be conserved through evolution, we can use comparisons between
species to increase the reliability of binding site predictions. In this a
rticle, an approach is presented to evaluate the computational predicitions
of regulatory sites. We combine the prediction of transcription units havi
ng orthologous genes with the prediction of transcription factor binding si
tes based on probabilistic models. We augment the sets of genes in Escheric
hia coli that are expected to be regulated by two transcription factors, th
e cAMP receptor. protein and the fumarate and nitrate reduction regulatory
protein, through a comparison with the Haemophilus influenzae genome. At th
e same time, we learned more about the regulatory networks of H. influenzae
, a species with much less experimental knowledge than E. coll. By studying
orthologous genes subject to regulation by the same transcription factor,
we also gained understanding of the evolution of the entire regulatory syst
ems.