Analysis of genomic sequences by Chaos Game Representation

Citation
Js. Almeida et al., Analysis of genomic sequences by Chaos Game Representation, BIOINFORMAT, 17(5), 2001, pp. 429-437
Citations number
31
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
17
Issue
5
Year of publication
2001
Pages
429 - 437
Database
ISI
SICI code
1367-4803(200105)17:5<429:AOGSBC>2.0.ZU;2-I
Abstract
Motivation: Chaos Game Representation (CGR) is an iterative mapping techniq ue that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their po sition in a continuous space. This distribution of positions has two proper ties: it is unique, and the source sequence can be recovered from the coord inates such that distance between positions measures similarity between the corresponding sequences. The possibility of using the latter property to i dentify succession schemes have been entirely overlooked in previous studie s which raises the possibility that CGR may be upgraded from a mere represe ntation technique to a sequence modeling tool. Results: The distribution of positions in the CGR plane were shown to be a generalization of Markov chain probability tables that accommodates non-int eger orders. Therefore, Markov models are particular cases of CGR models ra ther than the reverse, as currently accepted. In addition, the CGR generali zation has both practical (computational efficiency) and fundamental (scale independence) advantages. These results are illustrated by using Escherich ia coli K-12 as a test data-set, in particular, the genes thrA, thrB and th rC of the threonine operon.