Enrichment of regulatory signals in conserved non-coding genomic sequence

Citation
S. Levy et al., Enrichment of regulatory signals in conserved non-coding genomic sequence, BIOINFORMAT, 17(10), 2001, pp. 871-877
Citations number
23
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
17
Issue
10
Year of publication
2001
Pages
871 - 877
Database
ISI
SICI code
1367-4803(200110)17:10<871:EORSIC>2.0.ZU;2-E
Abstract
Motivation: Whole genome shotgun sequencing strategies generate sequence da ta prior to the application of assembly methodologies that result in contig uous sequence. Sequence reads can be employed to indicate regions of conser vation between closely related species for which only one genome has been a ssembled. Consequently, by using pairwise sequence alignments methods it is possible to identify novel, non-repetitive, conserved segments in non-codi ng sequence that exist between the assembled human genome and mouse whole g enome shotgun sequencing fragments. Conserved non-coding regions identify p otentially functional DNA that could be involved in transcriptional regulat ion. Results: Local sequence alignment methods were applied employing mouse frag ments and the assembled human genome. In addition, transcription factor bin ding sites were detected by aligning their corresponding positional weight matrices to the sequence regions. These methods were applied to a set of tr anscripts corresponding to 502 genes associated with a variety of different human diseases taken from the Online Mendelian Inheritance in Man database . Using statistical arguments we have shown that conserved non-coding segme nts contain an enrichment of transcription factor binding sites when compar ed to the sequence background in which the conserved segments are located. This enrichment of binding sites was not observed in coding sequence. Conse rved non-coding segments are not extensively repeated in the genome and the refore their identification provides a rapid means of finding genes with re lated conserved regions, and consequently potentially related regulatory me chanism. Conserved segments in upstream regions are found to contain bindin g sites that are co-localized in a manner consistent with experimentally kn own transcription factor pairwise co-occurrences and afford the identificat ion of novel co-occurring Transcription Factor (TF) pairs. This study provi des a methodology and more evidence to suggest that conserved non-coding re gions are biologically significant since they contain a statistical enrichm ent of regulatory signals and pairs of signals that enable the construction of regulatory models for human genes.