Statistical analysis of the DNA sequence of human chromosome 22 - art. no.041917

Citation
D. Holste et al., Statistical analysis of the DNA sequence of human chromosome 22 - art. no.041917, PHYS REV E, 6404(4), 2001, pp. 1917
Citations number
72
Categorie Soggetti
Physics
Journal title
PHYSICAL REVIEW E
ISSN journal
1063651X → ACNP
Volume
6404
Issue
4
Year of publication
2001
Part
1
Database
ISI
SICI code
1063-651X(200110)6404:4<1917:SAOTDS>2.0.ZU;2-M
Abstract
We study statistical patterns in the DNA sequence of human chromosome 22, t he first completely sequenced human chromosome. We find that (i) the 33.4 x 10(6) nucleotide long human chromosome exhibits long-range power-law corre lations over more than four orders of magnitude, (ii) the entropies H-n of the frequency distribution of oligonucleotides of length n (n-mers) grow su blinearly with increasing n, indicating the presence of higher-order correl ations for all of the studied lengths 1 less than or equal to n less than o r equal to 10, and (iii) the generalized entropies H-n(q) of n-mers decreas e monotonically with increasing q and the decay of H-n(q) with q becomes st eeper with increasing n less than or equal to 10, indicating that the frequ ency distribution of oligonucleotides becomes increasingly nonuniform as th e length n increases. We investigate to what degree known biological featur es may explain the observed statistical patterns. We find that (iv) the pre sence of interspersed repeats may cause the sublinear increase of H-n with n, and that (v) the presence of monomeric tandem repeats as well as the sup pression of CG dinucleotides may cause the observed decay of H-n(q) with q.