Identification of related proteins with weak sequence identity using secondary structure information

Citation
C. Geourjon et al., Identification of related proteins with weak sequence identity using secondary structure information, PROTEIN SCI, 10(4), 2001, pp. 788-797
Citations number
27
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEIN SCIENCE
ISSN journal
09618368 → ACNP
Volume
10
Issue
4
Year of publication
2001
Pages
788 - 797
Database
ISI
SICI code
0961-8368(200104)10:4<788:IORPWW>2.0.ZU;2-0
Abstract
Molecular modeling of proteins is confronted with the problem of finding ho mologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequenc e identity detection, structural relationships are still difficult to estab lish with high reliability. As protein structures are more conserved than s equences, we investigated the possibility of using protein secondary struct ure comparison (observed or predicted structures) to discriminate between r elated and unrelated proteins sequences in the range of 10%-30% sequence id entity. Pairwise comparison of secondary structures have been measured usin g the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structural ly related. Taking into account the secondary structures of proteins that h ave been detected by BLAST, FASTA, or SSEARCH in the noisy region (with hig h E value), we show that distantly related protein sequences (even with <20 % identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected rela ted proteins and to select proteins for experimental investigation in a str uctural genomic approach, as well as for genome annotation.