POSSIBLE ORIGIN OF POWER-LAW BEHAVIOR IN N-TUPLE ZIPF ANALYSIS

Citation
A. Czirok et al., POSSIBLE ORIGIN OF POWER-LAW BEHAVIOR IN N-TUPLE ZIPF ANALYSIS, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 53(6), 1996, pp. 6371-6375
Citations number
25
Categorie Soggetti
Physycs, Mathematical","Phsycs, Fluid & Plasmas
ISSN journal
1063651X
Volume
53
Issue
6
Year of publication
1996
Part
B
Pages
6371 - 6375
Database
ISI
SICI code
1063-651X(1996)53:6<6371:POOPBI>2.0.ZU;2-I
Abstract
In n-tuple Zipf analysis, ''words'' are defined as strings of n digits , and their normalized frequency of occurrence omega is measured for a given ''text'' (sequence of digits). In the case of various non-Marko vian sequences, the probability density of the frequencies P(omega) ha s a power-law tail. Here we argue that a broad class of unbiased binar y texts exhibiting a nonexponential distribution of cluster sizes can indeed yield a power-law behavior of P(omega), where we define cluster s to be strings of identical digits. We support this result by numeric al studies of long-range correlated sequences generated by three diffe rent methods that result in nonexponential cluster-size distribution: inverse Fourier transformation, Levy walks, and the expansion-modifica tion system. Our calculations shed light on the possible connection be tween the Zipf plot and the non-Markovian nature of the text: as the l ong-range correlations become dominant, the probability of the appeara nce of long clusters is increased, leading to the observed ''scaling'' in the Zipf plot.