The Shannon entropy is a standard measure for the order state of symbo
l sequences, such as, for example, DNA sequences. In order to incorpor
ate correlations between symbols, the entropy of n-mers (consecutive s
trands of n symbols) has to be determined. Here, an assay is presented
to estimate such higher order entropies (block entropies) for DNA seq
uences when the actual number of observations is small compared with t
he number of possible outcomes. The n-mer probability distribution und
erlying the dynamical process is reconstructed using elementary statis
tical principles: The theorem of asymptotic equi-distribution and the
Maximum Entropy Principle. Constraints are set to force the constructe
d distributions to adopt features which are characteristic for the rea
l probability distribution. From the many solutions compatible with th
ese constraints the one with the highest entropy is the most likely on
e according to the Maximum Entropy Principle. An algorithm performing
this procedure is expounded. It is tested by applying it to various DN
A model sequences whose exact entropies are known. Finally, results fo
r a real DNA sequence, the complete genome of the Epstein Parr virus,
are presented and compared with those of other information carriers (t
exts, computer source code, music). It seems as if DNA sequences posse
ss much more freedom in the combination of the symbols of their alphab
et than written language or computer source codes. (C) 1997 Academic P
ress Limited.