ITA
ENG

SENSITIVITY AND SELECTIVITY IN PROTEIN SIMILARITY SEARCHES - A COMPARISON OF SMITH-WATERMAN IN HARDWARE TO BLAST AND FASTA

Authors

SHPAER EG ROBINSON M YEE D CANDLIN JD MINES R HUNKAPILLER T

Citation

Eg. Shpaer et al., SENSITIVITY AND SELECTIVITY IN PROTEIN SIMILARITY SEARCHES - A COMPARISON OF SMITH-WATERMAN IN HARDWARE TO BLAST AND FASTA, Genomics, 38(2), 1996, pp. 179-191

Citations number

Categorie Soggetti

Genetics & Heredity

Journal title

Genomics → ACNP

ISSN journal

08887543

Volume

Issue

Year of publication

1996

Pages

179 - 191

Database

ISI

SICI code

0888-7543(1996)38:2<179:SASIPS>2.0.ZU;2-L

Abstract

To predict the functions of a possible protein product of any new or u ncharacterized DNA sequence, it is important first to detect all signi ficant similarities between the encoded amino acid sequence and any ac cumulated protein sequence data. We have implemented a set of queries and database sequences and proceeded to test and compare various simil arity search methods and their parameterizations. We demonstrate here that the Smith-Waterman (S-W) dynamic programming method and the optim ized version of FASTA are significantly better able to distinguish tru e similarities from statistical noise than is the popular database sea rch tool BLAST. Also, a simple ''log-length normalization'' of S-W sco res based on the query and target sequence lengths greatly increased t he selectivity of the S-W searches, exceeding the default normalizatio n method of FASTA. An implementation of the modified S-W algorithm in hardware (the Fast Data Finder) is able to match the accuracy of softw are versions while greatly speeding up its execution. We present here the selectivity and sensitivity data from these tests as well as resul ts for various scoring matrices. We present data that will help users to choose threshold score values for evaluation of database search res ults. We also illustrate the impact of using simple-sequence masking t ools such as SEG or XNU. (C) 1996 Academic Press, Inc.