The identification and interpretation of the regulatory signals within the
human genome remain among the greatest goals and most difficult challenges
genome analysis. The ability to predict the temporal and spatial control of
transcription is likely to require a combination of methods to address the
contribution of sequence-specific signals, protein-protein interactions an
d chromatin structure. We present here a new procedure to identify clusters
of transcription factor binding sites characteristic of sequence modules e
xperimentally verified to direct transcription selectively to liver cells.
This algorithm is sufficiently specific to identify known regulatory sequen
ces in genes selectively expressed in liver, promising acceleration of expe
rimental promoter analysis. In combination with phylogenetic footprinting.
this improvement in the specificity of predictions is sufficient to motivat
e a scan of the human genome. Potential regulatory modules were identified
ill orthologous human and rodent genomic sequences containing both known an
d uncharacterized genes.