Da. Rosenblueth et al., SYNTACTIC RECOGNITION OF REGULATORY REGIONS IN ESCHERICHIA-COLI, Computer applications in the biosciences, 12(5), 1996, pp. 415-422
Motivation: One of the most common methodologies to identify cis-regul
atory sites in regulatory regions in the DNA is that of weight matrice
s, as testified by several articles in this issue. An alternative to s
trengthen the computational predictions in regulatory regions is to de
velop methods that incorporate more biological properties present in s
uch DNA regions. The grammatical implementation presented in this pape
r provides a concrete example in this direction.Results: On the basis
of the analysis of an exhaustive collection of regulatory regions in E
scherichia colt; a grammatical model for the regulatory regions of sig
ma(70) promoters has been developed. The terminal symbols of the gramm
ar represent individual sites for the binding of activator and repress
or proteins, and include the precise position of sites in relation to
transcription initiation. Combining these symbols, the grammar generat
es a large number of different sentences, each of which can be searche
d for matching against a collection of regulatory regions by means of
weight matrices specific for each set of sites for individual proteins
. On the basis of this grammatical model, a Prolog syntactic recognize
r is presented here. Specific sub-grammars for ArgR, LexA and TyrR wer
e implemented. When parsing a collection of 128 sigma(70) promoter reg
ions, the syntactic recognizer produces a much lower number of false-p
ositive sites than the standard search using weight matrices.