Ep. Xing et al., Automatic discovery of sub-molecular sequence domains in multi-aligned sequences: A dynamic programming algorithm for multiple alignment segmentation, J THEOR BIO, 212(2), 2001, pp. 129-139
Automatic identification of sub-structures in multi-aligned sequences is of
great importance for effective and objective structural/functional domain
annotation, phylogenetic treeing and other molecular analyses. We present a
segmentation algorithm that optimally partitions a given multi-alignment i
nto a set of potentially biologically significant blocks, or segments. This
algorithm applies dynamic programming and progressive optimization to the
statistical profile of a multi-alignment in order to optimally demarcate re
latively homogenous subregions. Using this algorithm, a large multi-alignme
nt of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns we
re identified automatically and efficiently: shared conserved domain; share
d variable motif, and rare signature sequence. Results were consistent with
the patterns identified through independent phylogenetic and structural ap
proaches. This algorithm facilitates the automation of sequence-based molec
ular structural and evolutionary analyses through statistical modeling and
high performance computation. (C) 2001 Academic Press.