Measure representation and multifractal analysis of complete genomes - art. no. 031903

Citation
Zg. Yu et al., Measure representation and multifractal analysis of complete genomes - art. no. 031903, PHYS REV E, 6403(3), 2001, pp. 1903
Citations number
44
Categorie Soggetti
Physics
Journal title
PHYSICAL REVIEW E
ISSN journal
1063651X → ACNP
Volume
6403
Issue
3
Year of publication
2001
Part
1
Database
ISI
SICI code
1063-651X(200109)6403:3<1903:MRAMAO>2.0.ZU;2-Y
Abstract
This paper introduces the notion of measure representation of DNA sequences . Spectral analysis and multifractal analysis are then performed on the mea sure representations of a large number of complete genomes. The main aim of this paper is to discuss the multifractal property of the measure represen tation and the classification of bacteria. From the measure representations and the values of the D-q spectra and related C-q curves, it is concluded that these complete genomes are not random sequences. In fact, spectral ana lyses performed indicate that these measure representations, considered as time series, exhibit strong long-range correlation. Here the long-range cor relation is for the K-strings with dictionary ordering, and it is different from the base pair correlations introduced by other people. For substrings with length K = 8, the D-q spectra of all organisms studied are multifract al-like and sufficiently smooth for the C-q curves to be meaningful. With t he decreasing value of K, the multifractality lessens. The C-q curves of al l bacteria resemble a classical phase transition at a critical point. But t he "analogous" phase transitions of chromosomes of nonbacteria organisms ar e different. Apart from chromosome 1 of C. elegans, they exhibit the shape of double-peaked specific heat function. A classification of genomes of bac teria by assigning to each sequence a point in two-dimensional space (D-1, D-1) and in three-dimensional space (D-1,D-1 D-2) was given. Bacteria that are close phylogenetically are almost close in the spaces (D-1, D-1) and (D -1 D-1, D-2).