On the complexity measures of genetic sequences

Citation
Vd. Gusev et al., On the complexity measures of genetic sequences, BIOINFORMAT, 15(12), 1999, pp. 994-999
Citations number
11
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
15
Issue
12
Year of publication
1999
Pages
994 - 999
Database
ISI
SICI code
1367-4803(199912)15:12<994:OTCMOG>2.0.ZU;2-T
Abstract
Motivation: It is well known that the regulatory regions of genomes are hig hly repetitive. They are rich in direct, symmetric and complemented repeats , and there is no doubt about the functional significance of these repeats. Among known measures of complexity, the Ziv-Lempel complexity measure refl ects most adequately repeats occurring in the text But this measure does no t take into account isomorphic repents. By isomorphic repeats we mean fragm ents that are identical (or symmetric) module some permutation of the alpha bet letters. Results: In this paper two complexity measures of symbolic sequences are pr oposed that generalize the Ziv-Lempel complexity measure by taking into acc ount any isomorphic repeats in the text (rather than just direct repeats as in Ziv-Lempel). The first of them, the complexity vector, is designed for small alphabets such as the alphabet of nucleotides. The second is based on a search for the longest isomorphic fragment in the history of sequence sy nthesis and can be used for alphabets of arbitrary cardinality These measur es have been used for recognition of structural regularities in DNA sequenc es. Some interesting structures related to the regulatory region of the hum an growth hormone are reported.