Ac. Camproux et al., Hidden Markov model approach for identifying the modular framework of the protein backbone, PROTEIN ENG, 12(12), 1999, pp. 1063-1073
The hidden Markov model (HMM) was used to identify recurrent short 3D struc
tural building blocks (SBBs) describing protein backbones, independently of
any a priori knowledge. Polypeptide chains are decomposed into a series of
short segments defined by their inter-a-carbon distances. Basically, the m
odel takes into account the sequentiality of the observed segments and assu
mes that each one corresponds to one of several possible SBBs. Fitting the
model to a database of non-redundant proteins allowed us to decode proteins
in terms of 12 distinct SBBs with different roles in protein structure. So
me SBBs correspond to classical regular secondary structures. Others corres
pond to a significant subdivision elf their bounding regions previously con
sidered to be a single pattern. The major contribution of the HMM is that t
his model implicitly takes into account the sequential connections between
SBBs and thus describes the most probable pathways by which the blocks are
connected to form the framework of the protein structures. Validation of th
e SBBs code was performed by extracting SBB series repeated in recoding pro
teins and examining their structural similarities. Preliminary results on t
he sequence specificity of SBBs suggest promising perspectives for the pred
iction of SBBs or series of SBBs from the protein sequences.