J. Park et al., Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods, J MOL BIOL, 284(4), 1998, pp. 1201-1210
The sequences of related proteins can diverge beyond the point where their
relationship can be recognised by pairwise sequence comparisons. In attempt
s to overcome this limitation, methods have been developed that use as a qu
ery, not a single sequence, but sets of related sequences or a representati
on of the characteristics shared by related sequences. Here we describe an
assessment of three of these methods: the SAM-T98 implementation of a hidde
n Markov model procedure; PSI-BLAST; and the intermediate sequence search (
ISS) procedure. We determined the extent to which these procedures can dete
ct evolutionary relationships between the members of the sequence database
PDBD40-J. This database, derived from the structural classification of prot
eins (SCOP), contains the sequences of proteins of known structure whose se
quence identities with each other are 40% or less. The evolutionary relatio
nships that exist between those that have low sequence identities were foun
d by the examination of their structural details and, in many cases, their
functional features. For nine false positive predictions out of a possible
432,680, i.e. at a false positive rate of about 1/50,000, SAM-T98 found 35%
of the true homologous relationships in PDBD40-J, whilst PSI-BLAST found 3
0% and ISS found 25%. Overall, this is about twice the number of PDBD40-J r
elations that can be detected by the pairwise comparison procedures FASTA (
17%) and GAP-BLAST (15%). For distantly related sequences in PDBD40-J, thos
e pairs whose sequence identity is less than 30%, SAM-T98 and PSI-BLAST det
ect three times the number of relationships found by the pairwise methods.
(C) 1998 Academic Press.