Universal data compression algorithms fail to compress genetic sequenc
es. It is due to the specificity of this particular kind of ''text.''
We analyze in some detail the properties of the sequences, which cause
the failure of classical algorithms. We then present a lossless algor
ithm, biocompress-2, to compress the information contained in DNA and
RNA sequences, based on the detection of regularities, such as the pre
sence of palindromes. The algorithm combines substitutional and statis
tical methods, and to the best of our knowledge, leads to the highest
compression of DNA. The results, although not satisfactory, give insig
ht to the necessary correlation between compression and comprehension
of genetic sequences.