In addition to genes, chromosomal DNA contains sequences that serve as
signals for turning on and off gene expression, These signals are tho
ught to be distributed as clusters in the regulatory regions of genes.
We develop a Bayesian model that views locating regulatory regions in
genomic DNA as a change-point problem, with the beginning of regulato
ry and non-regulatory regions corresponding to the change points. The
model is based on a hidden Markov chain. The data consist of nucleotid
e positions of protein-binding elements in a genomic DNA sequence. The
se positions are identified using a reference catalogue containing ele
ments that interact with transcription factors implicated in controlli
ng the expression of protein-encoding genes. Among the protein-binding
elements in a genomic DNA sequence, the statistical model automatical
ly selects those that tend to predict regulatory regions. We test the
model using viral sequences that include known regulatory legions and
provide the results obtained for human genomic DNA corresponding to th
e beta globin locus on chromosome 11. (C) 1997 Academic Press Limited.