V. Difrancesco et al., PROTEIN TOPOLOGY RECOGNITION FROM SECONDARY STRUCTURE SEQUENCES - APPLICATION OF THE HIDDEN MARKOV-MODELS TO THE ALPHA-CLASS PROTEINS, Journal of Molecular Biology, 267(2), 1997, pp. 446-463
The three-dimensional fold of a protein is described by the organizati
on of its secondary structure elements in 3D space, i.e. its ''topolog
y''. We find that the protein topology can be recognized from the 1D s
equence of secondary structure states of the residues alone. Automated
recognition is facilitated by use of hidden Markov models (HMMs) to r
epresent topology families of proteins. Such models can be trained on
the experimentally observed secondary structure sequences of family me
mbers using well established algorithms. Here, we model various topolo
gy groups in the alpha class of proteins and identify, from a large da
tabase, those proteins having the topology described by each model. Th
e correct topology family for protein secondary structure sequences co
uld be recognized 12 out of 14 times. When the observed secondary stru
cture sequences are replaced with predicted sequences recognition is s
till achievable 8 out of 14 times. The success rate for observed seque
nces indicates that our approach will become increasingly useful as th
e accuracy of secondary prediction algorithms is improved. Our study i
ndicates that the HMMs are useful for protein topology recognition eve
n when no detectable primary amino acid sequence similarity is present
. To illustrate the potential utility of our method, protein topology
recognition is attempted on leptin, the obese gene product, and the hu
man interleukin-6 sequence, for which fold predictions have been previ
ously published. (C) 1997 Academic Press Limited.