FASTA-SWAP AND FASTA-PAT - PATTERN DATABASE SEARCHES USING COMBINATIONS OF ALIGNED AMINO-ACIDS, AND A NOVEL SCORING THEORY

Citation
I. Ladunga et al., FASTA-SWAP AND FASTA-PAT - PATTERN DATABASE SEARCHES USING COMBINATIONS OF ALIGNED AMINO-ACIDS, AND A NOVEL SCORING THEORY, Journal of Molecular Biology, 259(4), 1996, pp. 840-854
Citations number
50
Categorie Soggetti
Biology
ISSN journal
00222836
Volume
259
Issue
4
Year of publication
1996
Pages
840 - 854
Database
ISI
SICI code
0022-2836(1996)259:4<840:FAF-PD>2.0.ZU;2-I
Abstract
We introduce two new pattern database search tools that utilize statis tical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the speci fic matrices introduced here and the low redundancy of pattern databas es increase search sensitivity and selectivity. Pattern scoring prefer entially rewards matches at conserved positions in a pattern with high er scores than matches at variable positions, and assigns more negativ e scores to mismatches at conserved positions than to mismatches at va riable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing s equence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequ ence query against a pattern database. In the first step, FASTA-SWAP s earches the diagonals of the query sequence and the library pattern fo r high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino ac id residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools a re well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FAS TA-PAT searches can be performed using out World-Wide Web Server cm.tm c.edu:9331/seq-search/Options/fastapat.htm1). (C) 1996 Academic Press Limited