Optimizing reduced-space sequence analysis

Citation
R. Wheeler et R. Hughey, Optimizing reduced-space sequence analysis, BIOINFORMAT, 16(12), 2000, pp. 1082-1090
Citations number
16
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
16
Issue
12
Year of publication
2000
Pages
1082 - 1090
Database
ISI
SICI code
1367-4803(200012)16:12<1082:ORSA>2.0.ZU;2-8
Abstract
Motivation Dynamic programming is the core algorithm of sequence comparison , alignment and linear hidden Markov model (HMM) training. For a pair of se quence lengths m and n, the problem cart be solved readily in O (mn) time a nd O (mn) space. The checkpoint algorithm introduced by Grice et al. (CABIO S, 13, 45-53, 1997) runs in O(Lmn) time and O (Lm L rootn) space, where L i s a positive integer determined by m, n, and the amount of available worksp ace. The algorithm is appropriate for many string comparison problems, incl uding all-paths and single-best-path hidden Markov model training, and is r eadily parallelizable, The checkpoint algorithm has a diagonal version that can solve the single-best-path alignment problem in O (mn) time and O (m n) space. Results: In this work, Mle improve performance by analyzlizg optimal checkp oint placement The improved row checkpoint algorithm performs up to one hal f the computation of the original algorithm. The improved diagonal checkpoi nt algorithm performs up to 35% fever computational steps than the original . We modified the SAM hidden Markov modeling package to use the improved row checkpoint algorithm. For a fixed sequence length, the new version is up to 33% faster for all-paths and 56% faster for single-best-path HMM training, depending on sequence length and allocated memory. Over a typical set of p rotein sequence lengths, the improvement is similar to 10%.