ITA
ENG

Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions

Authors

Stojanovic, N Florea, L Riemer, C Gumucio, D Slightom, J Goodman, M Miller, W Hardison, R

Citation

N. Stojanovic et al., Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions, NUCL ACID R, 27(19), 1999, pp. 3899-3910

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

NUCLEIC ACIDS RESEARCH

ISSN journal

03051048 → ACNP

Volume

Issue

Year of publication

1999

Pages

3899 - 3910

Database

ISI

SICI code

0305-1048(19991001)27:19<3899:COFMFF>2.0.ZU;2-#

Abstract

Conserved segments in DNA or protein sequences are strong candidates for fu nctional elements and thus appropriate methods for computing them need to b e developed and compared, We describe five methods and computer programs fo r finding highly conserved blocks within previously computed multiple align ments, primarily for DNA sequences. Two of the methods are already in commo n use; these are based on good column agreement and high information conten t, Three additional methods find blocks with minimal evolutionary change, b locks that differ in at most k positions per row from a known center sequen ce and blocks that differ in at most: k positions per row from a center seq uence that is unknown a priori. The center sequence in the latter two metho ds is a way to model potential binding sites for known or unknown proteins in DNA sequences. The efficacy of each method was evaluated by analysis of three extensively analyzed regulatory regions in mammalian beta-globin gene clusters and the control region of bacterial arabinose operons, Although a ll five methods have quite different theoretical underpinnings, they produc e rather similar results on these data sets when their parameters are adjus ted to best approximate the experimental data, The optimal parameters for t he method based on information content varied little for different regulato ry regions of the beta-globin gene cluster and hence may be extrapolated to many other regulatory regions. The programs based on maximum allowed misma tches per row have simple parameters whose values can be chosen a priori an d thus they may be more useful than the other methods when calibration agai nst known functional sites is not available.