Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification

Citation
L. Marsan et Mf. Sagot, Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification, J COMPUT BI, 7(3-4), 2000, pp. 345-362
Citations number
37
Categorie Soggetti
Biochemistry & Biophysics
Journal title
JOURNAL OF COMPUTATIONAL BIOLOGY
ISSN journal
10665277 → ACNP
Volume
7
Issue
3-4
Year of publication
2000
Pages
345 - 362
Database
ISI
SICI code
1066-5277(2000)7:3-4<345:AFESMU>2.0.ZU;2-W
Abstract
This paper introduces two exact algorithms for extracting conserved structu red motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p greater than or equal to 1 "boxes" (each box corresponding to one part of the structured moth), p substitution rates (on e for each box) and p - 1 intervals of distance (one for each pair of succe ssive boxes in the collection). The contents of the boxes - that is, the mo tifs themselves - are unknown at the start of the algorithm. This is precis ely what the algorithms are meant to find. A suffix tree is used for findin g such moths, The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, fr om a set of unaligned sequences corresponding to the noncoding regions upst ream from all genes of a genome. In particular, both algorithms time comple xity scales linearly with N(2)n where n is the average length of the sequen ces and N their number. An application to the identification of promoter an d regulatory consensus sequences in bacterial genomes is shown.