A. Czirok et al., POSSIBLE ORIGIN OF POWER-LAW BEHAVIOR IN N-TUPLE ZIPF ANALYSIS, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 53(6), 1996, pp. 6371-6375
In n-tuple Zipf analysis, ''words'' are defined as strings of n digits
, and their normalized frequency of occurrence omega is measured for a
given ''text'' (sequence of digits). In the case of various non-Marko
vian sequences, the probability density of the frequencies P(omega) ha
s a power-law tail. Here we argue that a broad class of unbiased binar
y texts exhibiting a nonexponential distribution of cluster sizes can
indeed yield a power-law behavior of P(omega), where we define cluster
s to be strings of identical digits. We support this result by numeric
al studies of long-range correlated sequences generated by three diffe
rent methods that result in nonexponential cluster-size distribution:
inverse Fourier transformation, Levy walks, and the expansion-modifica
tion system. Our calculations shed light on the possible connection be
tween the Zipf plot and the non-Markovian nature of the text: as the l
ong-range correlations become dominant, the probability of the appeara
nce of long clusters is increased, leading to the observed ''scaling''
in the Zipf plot.