Automatic discovery of sub-molecular sequence domains in multi-aligned sequences: A dynamic programming algorithm for multiple alignment segmentation

Citation
Ep. Xing et al., Automatic discovery of sub-molecular sequence domains in multi-aligned sequences: A dynamic programming algorithm for multiple alignment segmentation, J THEOR BIO, 212(2), 2001, pp. 129-139
Citations number
20
Categorie Soggetti
Multidisciplinary
Journal title
JOURNAL OF THEORETICAL BIOLOGY
ISSN journal
00225193 → ACNP
Volume
212
Issue
2
Year of publication
2001
Pages
129 - 139
Database
ISI
SICI code
0022-5193(20010921)212:2<129:ADOSSD>2.0.ZU;2-K
Abstract
Automatic identification of sub-structures in multi-aligned sequences is of great importance for effective and objective structural/functional domain annotation, phylogenetic treeing and other molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment i nto a set of potentially biologically significant blocks, or segments. This algorithm applies dynamic programming and progressive optimization to the statistical profile of a multi-alignment in order to optimally demarcate re latively homogenous subregions. Using this algorithm, a large multi-alignme nt of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns we re identified automatically and efficiently: shared conserved domain; share d variable motif, and rare signature sequence. Results were consistent with the patterns identified through independent phylogenetic and structural ap proaches. This algorithm facilitates the automation of sequence-based molec ular structural and evolutionary analyses through statistical modeling and high performance computation. (C) 2001 Academic Press.