ITA
ENG

AN EFFICIENT COMPUTATIONAL METHOD FOR GLOBALLY OPTIMAL THREADING

Authors

XU Y XU D UBERBACHER EC

Citation

Y. Xu et al., AN EFFICIENT COMPUTATIONAL METHOD FOR GLOBALLY OPTIMAL THREADING, Journal of computational biology, 5(3), 1998, pp. 597-614

Citations number

Categorie Soggetti

Mathematics,Biology,"Biochemical Research Methods",Mathematics,"Biothechnology & Applied Migrobiology

Journal title

Journal of computational biology → ACNP

ISSN journal

10665277

Volume

Issue

Year of publication

1998

Pages

597 - 614

Database

ISI

SICI code

1066-5277(1998)5:3<597:AECMFG>2.0.ZU;2-A

Abstract

Computational recognition of native-like folds of an anonymous amino a cid sequence from a protein fold database is considered to be a promis ing approach to the three-dimensional (3D) fold prediction of the amin o acid sequence. We present a new method for protein fold recognition through optimally aligning an amino acid sequence and a protein fold t emplate (protein threading). The fitness of aligning an amino acid seq uence with a fold template is measured by (1) the singleton fitness, r epresenting the compatibility of substituting one amino acid by anothe r and the combined preference of secondary structure and solvent acces sibility for a particular amino acid, (2) the pair,vise interaction, r epresenting the contact preference between a pair of amino acids, and (3) alignment gap penalties. Though a protein threading problem so def ined is known to be NP-hard in the most general sense, our algorithm r uns efficiently if we place a cutoff distance on the pairwise interact ions, as many of the existing threading programs do. For an amino acid sequence of size n and a fold template of size m with M core secondar y structures, the algorithm finds an optimal alignment in O (Mn1.5C+1 + mn(C+1)) time and O (MnC+1) space, where C is a (small) nonnegative integer, determined by a particular mathematical property of the pairw ise interactions. As a case study, we have demonstrated that C is less than or equal to 4 for about 75% of the 293 unique folds in our prote in database, when pairwise interactions are restricted to amino acids less than or equal to 7 Angstrom apart (measured between their beta ca rbon atoms). An approximation scheme is developed for fold templates w ith C > 4, when threading requires too much memory and time to be prac tical on a typical workstation.