Eg. Shpaer et al., SENSITIVITY AND SELECTIVITY IN PROTEIN SIMILARITY SEARCHES - A COMPARISON OF SMITH-WATERMAN IN HARDWARE TO BLAST AND FASTA, Genomics, 38(2), 1996, pp. 179-191
To predict the functions of a possible protein product of any new or u
ncharacterized DNA sequence, it is important first to detect all signi
ficant similarities between the encoded amino acid sequence and any ac
cumulated protein sequence data. We have implemented a set of queries
and database sequences and proceeded to test and compare various simil
arity search methods and their parameterizations. We demonstrate here
that the Smith-Waterman (S-W) dynamic programming method and the optim
ized version of FASTA are significantly better able to distinguish tru
e similarities from statistical noise than is the popular database sea
rch tool BLAST. Also, a simple ''log-length normalization'' of S-W sco
res based on the query and target sequence lengths greatly increased t
he selectivity of the S-W searches, exceeding the default normalizatio
n method of FASTA. An implementation of the modified S-W algorithm in
hardware (the Fast Data Finder) is able to match the accuracy of softw
are versions while greatly speeding up its execution. We present here
the selectivity and sensitivity data from these tests as well as resul
ts for various scoring matrices. We present data that will help users
to choose threshold score values for evaluation of database search res
ults. We also illustrate the impact of using simple-sequence masking t
ools such as SEG or XNU. (C) 1996 Academic Press, Inc.