Distribution of the number of words with a prescribed frequency and tests of randomness

Citation
L. Rukhin, Andrew, Distribution of the number of words with a prescribed frequency and tests of randomness, Advances in applied probability , 34(4), 2002, pp. 775-797
ISSN journal
00018678
Volume
34
Issue
4
Year of publication
2002
Pages
775 - 797
Database
ACNP
SICI code
Abstract
The goal of this paper is to investigate properties of statistical procedures based on numbers of different patterns by using generating functions for the probabilities of a prescribed number of occurrences of given patterns in a random text. The asymptotic formulae are derived for the expected value of the number of words occurring a given number of times and for the covariance matrix. The form of the optimal linear test based on these statistics is established. These problems appear in testing for the randomness of a string of binary bits, DNA sequencing, source coding, synchronization, quality control protocols, etc. Indeed, the probabilities of repeated (overlapping) patterns are important in information theory (the second-order properties of relative frequencies) and molecular biology problems (finding patterns with unexpectedly low or high frequencies).