NONGLOBULAR DOMAINS IN PROTEIN SEQUENCES - AUTOMATED SEGMENTATION USING COMPLEXITY-MEASURES

Authors
Citation
Jc. Wootton, NONGLOBULAR DOMAINS IN PROTEIN SEQUENCES - AUTOMATED SEGMENTATION USING COMPLEXITY-MEASURES, Computers & chemistry, 18(3), 1994, pp. 269-285
Citations number
46
Categorie Soggetti
Computer Application, Chemistry & Engineering",Chemistry,"Computer Science Interdisciplinary Applications
Journal title
ISSN journal
00978485
Volume
18
Issue
3
Year of publication
1994
Pages
269 - 285
Database
ISI
SICI code
0097-8485(1994)18:3<269:NDIPS->2.0.ZU;2-2
Abstract
Computational methods based on mathematically-defined measures of comp ositional complexity have been developed to distinguish globular and n on-globular regions of protein sequences. Compact globular structures in protein molecules are shown to be determined by amino acid sequence s of high informational complexity. Sequences of known crystal structu re in the Brookhaven Protein Data Bank differ only slightly from rando mly shuffled sequences in the distribution of statistical properties s uch as local compositional complexity. In contrast, in the much larger body of deduced sequences in the SWISS-PROT database, approximately o ne quarter of the residues occur in segments of non-randomly low compl exity and approximately half of the entries contain at least one such segment. Sequences of proteins with known, physicochemically-defined n on-globular regions have been analyzed, including collagens, different classes of coiled-coil proteins, elastins, histones, non-histone prot eins, mucins, proteoglycan core proteins and proteins containing long single solvent-exposed alpha-helices. The SEG algorithm provides an ef fective general method for partitioning the globular and non-globular regions of these sequences fully automatically. This method is also fa cilitating the discovery of new classes of long, non-globular sequence segments, as illustrated by the example of the human CAN gene product involved in tumor induction.