ITA
ENG

Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification

Authors

Marsan, L Sagot, MF

Citation

L. Marsan et Mf. Sagot, Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification, J COMPUT BI, 7(3-4), 2000, pp. 345-362

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

JOURNAL OF COMPUTATIONAL BIOLOGY

ISSN journal

10665277 → ACNP

Volume

Issue

3-4

Year of publication

2000

Pages

345 - 362

Database

ISI

SICI code

1066-5277(2000)7:3-4<345:AFESMU>2.0.ZU;2-W

Abstract

This paper introduces two exact algorithms for extracting conserved structu red motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p greater than or equal to 1 "boxes" (each box corresponding to one part of the structured moth), p substitution rates (on e for each box) and p - 1 intervals of distance (one for each pair of succe ssive boxes in the collection). The contents of the boxes - that is, the mo tifs themselves - are unknown at the start of the algorithm. This is precis ely what the algorithms are meant to find. A suffix tree is used for findin g such moths, The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, fr om a set of unaligned sequences corresponding to the noncoding regions upst ream from all genes of a genome. In particular, both algorithms time comple xity scales linearly with N(2)n where n is the average length of the sequen ces and N their number. An application to the identification of promoter an d regulatory consensus sequences in bacterial genomes is shown.