Motivation: Chaos Game Representation (CGR) is an iterative mapping techniq
ue that processes sequences of units, such as nucleotides in a DNA sequence
or amino acids in a protein, in order to find the coordinates for their po
sition in a continuous space. This distribution of positions has two proper
ties: it is unique, and the source sequence can be recovered from the coord
inates such that distance between positions measures similarity between the
corresponding sequences. The possibility of using the latter property to i
dentify succession schemes have been entirely overlooked in previous studie
s which raises the possibility that CGR may be upgraded from a mere represe
ntation technique to a sequence modeling tool.
Results: The distribution of positions in the CGR plane were shown to be a
generalization of Markov chain probability tables that accommodates non-int
eger orders. Therefore, Markov models are particular cases of CGR models ra
ther than the reverse, as currently accepted. In addition, the CGR generali
zation has both practical (computational efficiency) and fundamental (scale
independence) advantages. These results are illustrated by using Escherich
ia coli K-12 as a test data-set, in particular, the genes thrA, thrB and th
rC of the threonine operon.