Hidden Markov models for detecting remote protein homologies

Citation
K. Karplus et al., Hidden Markov models for detecting remote protein homologies, BIOINFORMAT, 14(10), 1998, pp. 846-856
Citations number
35
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
14
Issue
10
Year of publication
1998
Pages
846 - 856
Database
ISI
SICI code
1367-4803(1998)14:10<846:HMMFDR>2.0.ZU;2-Y
Abstract
Motivation: A new hidden Markov model method (SAM-T98) for finding remote h omologs of protein sequences is described and evaluated. The method begins with a simple target sequence and iteratively builds a hidden Markov model (HMM) from the sequence and homologs found using die HMM for database searc h. SAM-T98 is also used to construct model libraries automatically, from se quences in structural databases. Methods: We evaluate the SAM-T98 method with foul datasets. Three of the te st sets are fold-recognition tests, where the correct answers are determine d by structural similarity. The fourth uses a curated database. The method is compared against WU-BLASTP and against DOUBLE-BLAST, a two-step method s imilar to ISS, but using BLAST instead of FASTA. Results: SAM-T98 had the fewest errors in all tests- dramatically so for th e fold-recognition tests. At the minimum-error point on the SCOP (Structura l Classification of Proteins)-domains test, SAM-T98 got 880 flue positives and 68 false positives, DOUBLE-BLAST got 533 true positives with 71 false p ositives, ann WU-BLASTP got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require paramete r adjustment to be used to find family or fold relationships, One key to th e performance of the HMM method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uni form null model.