M. Li et al., An information-based sequence distance and its application to whole mitochondrial genome phylogeny, BIOINFORMAT, 17(2), 2001, pp. 149-154
Motivation: Traditional sequence distances require an alignment and therefo
re are not directly applicable to the problem of whole genome phylogeny whe
re events such as rearrangements make full length alignments impossible. We
present a sequence distance that works on unaligned sequences using the in
formation theoretical concept of Kolmogorov complexity and a program to est
imate this distance.
Results: We establish the mathematical foundations of our distance and illu
strate its use by constructing a phylogeny of the Eutherian orders using co
mplete unaligned mitochondrial genomes. This phylogeny is consistent with t
he commonly accepted one for the Eutherians. A second, larger mammalian dat
aset is also analyzed, yielding a phylogeny generally consistent with the c
ommonly accepted one for the mammals.