PARAMETRIC OPTIMIZATION OF SEQUENCE ALIGNMENT

Citation
D. Gusfield et al., PARAMETRIC OPTIMIZATION OF SEQUENCE ALIGNMENT, Algorithmica, 12(4-5), 1994, pp. 312-326
Citations number
20
Categorie Soggetti
Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming
Journal title
ISSN journal
01784617
Volume
12
Issue
4-5
Year of publication
1994
Pages
312 - 326
Database
ISI
SICI code
0178-4617(1994)12:4-5<312:POOSA>2.0.ZU;2-#
Abstract
The optimal alignment or the weighted minimum edit distance between tw o DNA or amino acid sequences for a given set of weights is computed b y classical dynamic programming techniques, and is widely used in mole cular biology. However, in DNA and amino acid sequences there is consi derable disagreement about how to weight matches, mismatches, insertio ns/deletions (indels or spaces), and gaps. Parametric sequence alignme nt is the problem of computing the optimal-valued alignment between tw o sequences as a function of variable weights for matches, mismatches, spaces, and gaps. The goal is to partition the parameter space into r egions (which are necessarily convex) such that in each region one ali gnment is optimal throughout and such that the regions are maximal for this property. In this paper we are primarily concerned with the stru cture of this convex decomposition, and secondarily with the complexit y of computing the decomposition. The most striking results are the fo llowing: For the special case where only matches, mismatches, and spac es are counted, and where spaces are counted throughout the alignment, we show that the decomposition is surprisingly simple: all regions ar e infinite; there are at most n2/3 regions; the lines that bound the r egions are all of the form beta = c + (c + 0.5)alpha; and the entire d ecomposition can be found in 0(knm) time, where k is the actual number of regions, and n < m are the lengths of the two strings. These resul ts were found while implementing a large software package for parametr ic sequence analysis, and in turn have led to faster algorithms for th ose tasks. A conference version of this paper first appeared in [10].