C. Geourjon et al., Identification of related proteins with weak sequence identity using secondary structure information, PROTEIN SCI, 10(4), 2001, pp. 788-797
Molecular modeling of proteins is confronted with the problem of finding ho
mologous proteins, especially when few identities remain after the process
of molecular evolution. Using even the most recent methods based on sequenc
e identity detection, structural relationships are still difficult to estab
lish with high reliability. As protein structures are more conserved than s
equences, we investigated the possibility of using protein secondary struct
ure comparison (observed or predicted structures) to discriminate between r
elated and unrelated proteins sequences in the range of 10%-30% sequence id
entity. Pairwise comparison of secondary structures have been measured usin
g the structural overlap (Sov) parameter. In this article, we show that if
the secondary structures likeness is >50%, most of the pairs are structural
ly related. Taking into account the secondary structures of proteins that h
ave been detected by BLAST, FASTA, or SSEARCH in the noisy region (with hig
h E value), we show that distantly related protein sequences (even with <20
% identity) can be still identified. This strategy can be used to identify
three-dimensional templates in homology modeling by finding unexpected rela
ted proteins and to select proteins for experimental investigation in a str
uctural genomic approach, as well as for genome annotation.