ITA
ENG

Information content of protein sequences

Authors

Weiss, O Jimenez-Montano, MA Herzel, H

Citation

O. Weiss et al., Information content of protein sequences, J THEOR BIO, 206(3), 2000, pp. 379-386

Citations number

Categorie Soggetti

Multidisciplinary

Journal title

JOURNAL OF THEORETICAL BIOLOGY

ISSN journal

00225193 → ACNP

Volume

206

Issue

Year of publication

2000

Pages

379 - 386

Database

ISI

SICI code

0022-5193(20001007)206:3<379:ICOPS>2.0.ZU;2-S

Abstract

The complexity of large sets of non-redundant protein sequences is measured . This is done by estimating the Shannon entropy as well as applying compre ssion algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our res ults show that proteins are fairly close to random sequences. The entropy r eduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects . Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarde d as slightly edited random strings. We discuss secondary structure and low -complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides. (C) 2000 Academic Press.