SENSITIVITY AND SELECTIVITY IN PROTEIN SIMILARITY SEARCHES - A COMPARISON OF SMITH-WATERMAN IN HARDWARE TO BLAST AND FASTA

Citation
Eg. Shpaer et al., SENSITIVITY AND SELECTIVITY IN PROTEIN SIMILARITY SEARCHES - A COMPARISON OF SMITH-WATERMAN IN HARDWARE TO BLAST AND FASTA, Genomics, 38(2), 1996, pp. 179-191
Citations number
26
Categorie Soggetti
Genetics & Heredity
Journal title
ISSN journal
08887543
Volume
38
Issue
2
Year of publication
1996
Pages
179 - 191
Database
ISI
SICI code
0888-7543(1996)38:2<179:SASIPS>2.0.ZU;2-L
Abstract
To predict the functions of a possible protein product of any new or u ncharacterized DNA sequence, it is important first to detect all signi ficant similarities between the encoded amino acid sequence and any ac cumulated protein sequence data. We have implemented a set of queries and database sequences and proceeded to test and compare various simil arity search methods and their parameterizations. We demonstrate here that the Smith-Waterman (S-W) dynamic programming method and the optim ized version of FASTA are significantly better able to distinguish tru e similarities from statistical noise than is the popular database sea rch tool BLAST. Also, a simple ''log-length normalization'' of S-W sco res based on the query and target sequence lengths greatly increased t he selectivity of the S-W searches, exceeding the default normalizatio n method of FASTA. An implementation of the modified S-W algorithm in hardware (the Fast Data Finder) is able to match the accuracy of softw are versions while greatly speeding up its execution. We present here the selectivity and sensitivity data from these tests as well as resul ts for various scoring matrices. We present data that will help users to choose threshold score values for evaluation of database search res ults. We also illustrate the impact of using simple-sequence masking t ools such as SEG or XNU. (C) 1996 Academic Press, Inc.