ITA
ENG

PARAMETRIC OPTIMIZATION OF SEQUENCE ALIGNMENT

Authors

GUSFIELD D BALASUBRAMANIAN K NAOR D

Citation

D. Gusfield et al., PARAMETRIC OPTIMIZATION OF SEQUENCE ALIGNMENT, Algorithmica, 12(4-5), 1994, pp. 312-326

Citations number

Categorie Soggetti

Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming

Journal title

Algorithmica → ACNP

ISSN journal

01784617

Volume

Issue

4-5

Year of publication

1994

Pages

312 - 326

Database

ISI

SICI code

0178-4617(1994)12:4-5<312:POOSA>2.0.ZU;2-#

Abstract

The optimal alignment or the weighted minimum edit distance between tw o DNA or amino acid sequences for a given set of weights is computed b y classical dynamic programming techniques, and is widely used in mole cular biology. However, in DNA and amino acid sequences there is consi derable disagreement about how to weight matches, mismatches, insertio ns/deletions (indels or spaces), and gaps. Parametric sequence alignme nt is the problem of computing the optimal-valued alignment between tw o sequences as a function of variable weights for matches, mismatches, spaces, and gaps. The goal is to partition the parameter space into r egions (which are necessarily convex) such that in each region one ali gnment is optimal throughout and such that the regions are maximal for this property. In this paper we are primarily concerned with the stru cture of this convex decomposition, and secondarily with the complexit y of computing the decomposition. The most striking results are the fo llowing: For the special case where only matches, mismatches, and spac es are counted, and where spaces are counted throughout the alignment, we show that the decomposition is surprisingly simple: all regions ar e infinite; there are at most n2/3 regions; the lines that bound the r egions are all of the form beta = c + (c + 0.5)alpha; and the entire d ecomposition can be found in 0(knm) time, where k is the actual number of regions, and n < m are the lengths of the two strings. These resul ts were found while implementing a large software package for parametr ic sequence analysis, and in turn have led to faster algorithms for th ose tasks. A conference version of this paper first appeared in [10].