ITA
ENG

DISCOVERING DEPENDENCIES VIA ALGORITHMIC MUTUAL INFORMATION - A CASE-STUDY IN DNA-SEQUENCE COMPARISONS

Authors

MILOSAVLJEVIC A

Citation

A. Milosavljevic, DISCOVERING DEPENDENCIES VIA ALGORITHMIC MUTUAL INFORMATION - A CASE-STUDY IN DNA-SEQUENCE COMPARISONS, Machine learning, 21(1-2), 1995, pp. 35-50

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Artificial Intelligence",Neurosciences

Journal title

Machine learning → ACNP

ISSN journal

08856125

Volume

Issue

1-2

Year of publication

1995

Pages

35 - 50

Database

ISI

SICI code

0885-6125(1995)21:1-2<35:DDVAMI>2.0.ZU;2-#

Abstract

Algorithmic mutual information is a central concept in algorithmic inf ormation theory and may be measured as the difference between independ ent and joint minimal encoding lengths of objects; it is also a centra l concept in Chaitin's fascinating mathematical definition of life. We explore applicability of algorithmic mutual information as a tool for discovering dependencies in biology. In order to determine significan ce of discovered dependencies, we extend the newly proposed algorithmi c significance method. The main theorem of the extended method states that d bits of algorithmic mutual information imply dependency at the significance level 2(-d+O(1)). We apply a heuristic version of the met hod to one of the main problems in DNA and protein sequence comparison s: the problem of deciding whether observed similarity between sequenc es should be explained by their relatedness or by the mere presence of some shared internal structure, e.g., shared internal repetitive patt erns. We take advantage of the fact that mutual information factors ou t sequence similarity that is due to shared internal structure and thu s enables discovery of truly related sequences. In addition to providi ng a general framework for sequence comparisons, we also propose an ef ficient way to compare sequences based on their subword composition th at does not require any a priori assumptions about k-tuple length.