V. Geetha et al., Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs, PROTEIN ENG, 12(7), 1999, pp. 527-534
We have compared a novel sequence-structure matching technique, FORESST, fo
r detecting remote homologs to three existing sequence based methods, inclu
ding local amino acid sequence similarity by BLASTP, hidden Markov models (
HMMs) of sequences of protein families using SAM, HMMs based on sequence mo
tifs identified using meta-MEME. FORESST compares predicted secondary struc
tures to a library of structural families of proteins, using HMMs, Altogeth
er 45 proteins from nine structural families in the database CATH were used
in a cross-validated test of the fold assignment accuracy of each method.
Local sequence similarity of a query sequence to a protein family is measur
ed by the highest segment pair (HSP) score, Each of the HMM-based approache
s (FORESST, MEME, amino acid sequence-based HMM) yielded log-odds score for
the query sequence. In order to make a fair comparison among these methods
, the scores for each method were converted to Z-scores in a uniform way by
comparing the raw scores of a query protein with the corresponding scores
for a set of unrelated proteins, Z-Scores were analyzed as a function of th
e maximum pairwise sequence identity (MPSID) of the query sequence to seque
nces used in training the model. For MPSID above 20%, the Z-scores increase
linearly with MPSID for the sequence-based methods but remain roughly cons
tant for FORESST. Below 15%, average Z-scores are close to zero for the seq
uence-based methods, whereas the FORESST method yielded average Z-scores of
1.8 and 1.1, using observed and predicted secondary structures, respective
ly. This demonstrates the advantage of the sequence-structure method for de
tecting remote homologs.