DISCOVERING SIMPLE DNA-SEQUENCES BY THE ALGORITHMIC SIGNIFICANCE METHOD

Citation
A. Milosavljevic et J. Jurka, DISCOVERING SIMPLE DNA-SEQUENCES BY THE ALGORITHMIC SIGNIFICANCE METHOD, Computer applications in the biosciences, 9(4), 1993, pp. 407-411
Citations number
22
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Applications & Cybernetics","Biology Miscellaneous
ISSN journal
02667061
Volume
9
Issue
4
Year of publication
1993
Pages
407 - 411
Database
ISI
SICI code
0266-7061(1993)9:4<407:DSDBTA>2.0.ZU;2-G
Abstract
A new method, 'algorithmic significance', is proposed as a tool for di scovery of patterns in DNA sequences. The main idea is that patterns c an be discovered by finding ways to encode the observed data concisely . In this sense, the method can be viewed as a formal version of the O ccam's Razor principle. In this paper the method is applied to discove r significantly simple DNA sequences. We define DNA sequences to be si mple if they contain repeated occurrences of certain 'words' and thus can be encoded in a small number of bits. Such definition includes min isatellites and microsatellites. A standard dynamic programming algori thm for data compression is applied to compute the minimal encoding le ngths of sequences in linear time. An electronic mail server for ident ification of simple sequences based on the proposed method has been in stalled at the Internet address pythia@anl.gov.