I. Ladunga et al., FASTA-SWAP AND FASTA-PAT - PATTERN DATABASE SEARCHES USING COMBINATIONS OF ALIGNED AMINO-ACIDS, AND A NOVEL SCORING THEORY, Journal of Molecular Biology, 259(4), 1996, pp. 840-854
We introduce two new pattern database search tools that utilize statis
tical significance and information theory to improve protein function
identification. Both the general pattern scoring theory with the speci
fic matrices introduced here and the low redundancy of pattern databas
es increase search sensitivity and selectivity. Pattern scoring prefer
entially rewards matches at conserved positions in a pattern with high
er scores than matches at variable positions, and assigns more negativ
e scores to mismatches at conserved positions than to mismatches at va
riable positions. The theory of pattern scoring can be used to create
log-odds pattern scores for patterns derived from any set of multiple
alignments. This theoretical framework can be used to adapt existing s
equence database search tools to pattern analysis. Our FASTA-SWAP and
FASTA-PAT tools are extensions of the FASTA program that search a sequ
ence query against a pattern database. In the first step, FASTA-SWAP s
earches the diagonals of the query sequence and the library pattern fo
r high-scoring segments, while FASTA-PAT performs an extended version
of hashing. In the second step, both methods refine the alignments and
the scores using dynamic programming. The tools utilize an extremely
compact binary representation of all possible combinations of amino ac
id residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools a
re well suited for functional identification of distant relatives that
may be missed by sequence database search methods. FASTA-SWAP and FAS
TA-PAT searches can be performed using out World-Wide Web Server cm.tm
c.edu:9331/seq-search/Options/fastapat.htm1). (C) 1996 Academic Press
Limited