ON PATTERN FREQUENCY OCCURRENCES IN A MARKOVIAN SEQUENCE

Citation
M. Regnier et W. Szpankowski, ON PATTERN FREQUENCY OCCURRENCES IN A MARKOVIAN SEQUENCE, Algorithmica, 22(4), 1998, pp. 631-649
Citations number
41
Categorie Soggetti
Mathematics,"Computer Science Software Graphycs Programming",Mathematics,"Computer Science Software Graphycs Programming
Journal title
ISSN journal
01784617
Volume
22
Issue
4
Year of publication
1998
Pages
631 - 649
Database
ISI
SICI code
0178-4617(1998)22:4<631:OPFOIA>2.0.ZU;2-D
Abstract
Consider a given pattern H and a random text T generated by a Markovia n source. We study the frequency of pattern occurrences in a random te xt when overlapping copies of the pattern are counted separately. We p resent exact and asymptotic formulae for moments (including the varian ce), and probability of r pattern occurrences for three different regi ons of r, namely: (i) r = O(1), (ii) central limit regime, and (iii) l arge deviations regime. In order to derive these results, we first con struct certain language expressions that characterize pattern occurren ces which are later translated into generating functions. We then use analytical methods to extract asymptotic behaviors of the pattern freq uency from the generating functions. These findings are of particular interest to molecular biology problems (e.g., finding patterns with un expectedly high or low frequencies, and gene recognition), information theory (e.g., second-order properties of the relative frequency), and pattern matching algorithms (e.g., q-gram algorithms).