P. Baldi et al., HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION, Proceedings of the National Academy of Sciences of the United Statesof America, 91(3), 1994, pp. 1059-1063
Hidden Markov model (HMM) techniques are used to model families of bio
logical sequences. A smooth and convergent algorithm is introduced to
iteratively adapt the transition and emission parameters of the models
from the examples in a given family. The HMM approach is applied to t
hree protein families: globins, immunoglobulins, and kinases. In all c
ases, the models derived capture the important statistical characteris
tics of the family and can be used for a number of tasks, including mu
ltiple alignments, motif detection, and classification. For K sequence
s of average length N, this approach yields an effective multiple-alig
nment algorithm which requires O(KN2) operations, linear in the number
of sequences.