Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods

Citation
J. Park et al., Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods, J MOL BIOL, 284(4), 1998, pp. 1201-1210
Citations number
26
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
284
Issue
4
Year of publication
1998
Pages
1201 - 1210
Database
ISI
SICI code
0022-2836(199812)284:4<1201:SCUMSD>2.0.ZU;2-I
Abstract
The sequences of related proteins can diverge beyond the point where their relationship can be recognised by pairwise sequence comparisons. In attempt s to overcome this limitation, methods have been developed that use as a qu ery, not a single sequence, but sets of related sequences or a representati on of the characteristics shared by related sequences. Here we describe an assessment of three of these methods: the SAM-T98 implementation of a hidde n Markov model procedure; PSI-BLAST; and the intermediate sequence search ( ISS) procedure. We determined the extent to which these procedures can dete ct evolutionary relationships between the members of the sequence database PDBD40-J. This database, derived from the structural classification of prot eins (SCOP), contains the sequences of proteins of known structure whose se quence identities with each other are 40% or less. The evolutionary relatio nships that exist between those that have low sequence identities were foun d by the examination of their structural details and, in many cases, their functional features. For nine false positive predictions out of a possible 432,680, i.e. at a false positive rate of about 1/50,000, SAM-T98 found 35% of the true homologous relationships in PDBD40-J, whilst PSI-BLAST found 3 0% and ISS found 25%. Overall, this is about twice the number of PDBD40-J r elations that can be detected by the pairwise comparison procedures FASTA ( 17%) and GAP-BLAST (15%). For distantly related sequences in PDBD40-J, thos e pairs whose sequence identity is less than 30%, SAM-T98 and PSI-BLAST det ect three times the number of relationships found by the pairwise methods. (C) 1998 Academic Press.