Y. Xu et Ec. Uberbacher, A POLYNOMIAL-TIME ALGORITHM FOR A CLASS OF PROTEIN THREADING PROBLEMS, Computer applications in the biosciences, 12(6), 1996, pp. 511-517
This paper presents an algorithm for constructing an optimal alignment
between a three-dimensional protein structure template and an amino a
cid sequence. A protein structure template is given as a sequence of a
mino acid residue positions in three-dimensional space, along with an
array of physical properties attached to each position; these residue
positions are sequentially grouped into a series of core secondary str
uctures (central helices and beta sheets). In addition to match scores
and gap penalties, as in a traditional sequence-sequence alignment pr
oblem, the quality of a structure-sequence alignment is also detemined
by interaction preferences among amino acids aligned with structure p
ositions that are spatially close (we call these 'long-range interacti
ons'). Although it is known that constructing such a structure-sequenc
e alignment in the most general form is NP-hard, our algorithm runs in
polynomial time when restricted to structures with a 'modest' number
of long-range amino acid interactions. In the current work, long-range
interactions are limited to interactions between amino acids from dif
ferent cove secondary structures. Dividing the series of core secondar
y structures into two subseries creates a cut set of long-range intera
ctions. If we use N, M and C to represent the size of an amino acid se
quence, the size of a structure template, and the maximum cut size of
long-range interactions respectively, the algorithm finds an optimal s
tructure-sequence alignment in O(21(C)NM) time, a polynomial function
of N and M when C = O(log(N + M)). When running on structure-sequence
alignment problems without long-range intersections, i.e. C = O, the a
lgorithm achieves the same asymptotic computational complexity of the
Smith-Waterman sequence-sequence alignment algorithm.