A. Milosavljevic et J. Jurka, DISCOVERING SIMPLE DNA-SEQUENCES BY THE ALGORITHMIC SIGNIFICANCE METHOD, Computer applications in the biosciences, 9(4), 1993, pp. 407-411
A new method, 'algorithmic significance', is proposed as a tool for di
scovery of patterns in DNA sequences. The main idea is that patterns c
an be discovered by finding ways to encode the observed data concisely
. In this sense, the method can be viewed as a formal version of the O
ccam's Razor principle. In this paper the method is applied to discove
r significantly simple DNA sequences. We define DNA sequences to be si
mple if they contain repeated occurrences of certain 'words' and thus
can be encoded in a small number of bits. Such definition includes min
isatellites and microsatellites. A standard dynamic programming algori
thm for data compression is applied to compute the minimal encoding le
ngths of sequences in linear time. An electronic mail server for ident
ification of simple sequences based on the proposed method has been in
stalled at the Internet address pythia@anl.gov.