In the following, an overview is given on statistical and probabilistic pro
perties of words, as occurring in the analysis of biological sequences. Cou
nts of occurrence, counts of clumps, and renewal counts are distinguished,
and exact distributions as well as normal approximations, Poisson process a
pproximations, and compound Poisson approximations are derived. Here, a seq
uence is modelled as a stationary ergodic Markov chain; a test for determin
ing the appropriate order of the Markov chain is described. The convergence
results take the error made by estimating the Markovian transition probabi
lities into account, The main tools involved are moment generating function
s, martingales, Stein's method, and the Chen-Stein method. Similar results
are given for occurrences of multiple patterns, and, as an example, the pro
blem of unique recoverability of a sequence from SBH chip data is discussed
, Special emphasis lies on disentangling the complicated dependence structu
re between word occurrences, due to self-overlap as well as due to overlap
between words. The results can be used to derive approximate, and conservat
ive, confidence intervals for tests.