Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs

Citation
V. Geetha et al., Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs, PROTEIN ENG, 12(7), 1999, pp. 527-534
Citations number
44
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEIN ENGINEERING
ISSN journal
02692139 → ACNP
Volume
12
Issue
7
Year of publication
1999
Pages
527 - 534
Database
ISI
SICI code
0269-2139(199907)12:7<527:CPSAPS>2.0.ZU;2-2
Abstract
We have compared a novel sequence-structure matching technique, FORESST, fo r detecting remote homologs to three existing sequence based methods, inclu ding local amino acid sequence similarity by BLASTP, hidden Markov models ( HMMs) of sequences of protein families using SAM, HMMs based on sequence mo tifs identified using meta-MEME. FORESST compares predicted secondary struc tures to a library of structural families of proteins, using HMMs, Altogeth er 45 proteins from nine structural families in the database CATH were used in a cross-validated test of the fold assignment accuracy of each method. Local sequence similarity of a query sequence to a protein family is measur ed by the highest segment pair (HSP) score, Each of the HMM-based approache s (FORESST, MEME, amino acid sequence-based HMM) yielded log-odds score for the query sequence. In order to make a fair comparison among these methods , the scores for each method were converted to Z-scores in a uniform way by comparing the raw scores of a query protein with the corresponding scores for a set of unrelated proteins, Z-Scores were analyzed as a function of th e maximum pairwise sequence identity (MPSID) of the query sequence to seque nces used in training the model. For MPSID above 20%, the Z-scores increase linearly with MPSID for the sequence-based methods but remain roughly cons tant for FORESST. Below 15%, average Z-scores are close to zero for the seq uence-based methods, whereas the FORESST method yielded average Z-scores of 1.8 and 1.1, using observed and predicted secondary structures, respective ly. This demonstrates the advantage of the sequence-structure method for de tecting remote homologs.