Numerical comparison of several approximations of the word count distribution in random sequences

Citation
S. Robin et S. Schbath, Numerical comparison of several approximations of the word count distribution in random sequences, J COMPUT BI, 8(4), 2001, pp. 349-359
Citations number
16
Categorie Soggetti
Biochemistry & Biophysics
Journal title
JOURNAL OF COMPUTATIONAL BIOLOGY
ISSN journal
10665277 → ACNP
Volume
8
Issue
4
Year of publication
2001
Pages
349 - 359
Database
ISI
SICI code
1066-5277(2001)8:4<349:NCOSAO>2.0.ZU;2-5
Abstract
The exact distribution of word counts in random sequences and several appro ximations have been proposed in the past few years. The exact distribution has no theoretical limit but may require prohibitive computation time. On t he other hand, approximate distributions can be rapidly calculated but, in practice, are only accurate under specific conditions. After making a surve y of these distributions, we compare them according to both their accuracy and computational cost. Rules are suggested for choosing between Gaussian a pproximations, compound Poisson approximation, and exact distribution. This work is illustrated with the detection of exceptional words in the phage L ambda genome.