ITA
ENG

Sequence complexity for biological sequence analysis

Authors

Allison, L Stern, L Edgoose, T Dix, TI

Citation

L. Allison et al., Sequence complexity for biological sequence analysis, COMPUT CHEM, 24(1), 2000, pp. 43-55

Citations number

Categorie Soggetti

Chemistry

Journal title

COMPUTERS & CHEMISTRY

ISSN journal

00978485 → ACNP

Volume

Issue

Year of publication

2000

Pages

43 - 55

Database

ISI

SICI code

0097-8485(200001)24:1<43:SCFBSA>2.0.ZU;2-#

Abstract

A new statistical model for DNA considers a sequence to be a mixture of reg ions with little structure and regions that are approximate repeats of othe r subsequences, i.e. instances of repeats do not need to match each other e xactly. Both forward- and reverse-complementary repeats are allowed. The mo del has a small number of parameters which are fitted to the data. In gener al there are many explanations for a given sequence and how to compute the total probability of the data given the model is shown. Computer algorithms are described for these tasks. The model can be used to compute the inform ation content of a sequence, either in total or base by base. This amounts to looking at sequences from a data-compression point of view and it is arg ued that this is a good way to tackle intelligent sequence analysis in gene ral. (C) 2000 Elsevier Science Ltd. All rights reserved.