AUTOMATED PROTEIN-SEQUENCE DATABASE CLASSIFICATION - II - DELINEATIONOF DOMAIN BOUNDARIES FROM SEQUENCE SIMILARITIES

Authors
Citation
J. Gracy et P. Argos, AUTOMATED PROTEIN-SEQUENCE DATABASE CLASSIFICATION - II - DELINEATIONOF DOMAIN BOUNDARIES FROM SEQUENCE SIMILARITIES, BIOINFORMATICS, 14(2), 1998, pp. 174-187
Citations number
17
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Journal title
ISSN journal
13674803
Volume
14
Issue
2
Year of publication
1998
Pages
174 - 187
Database
ISI
SICI code
1367-4803(1998)14:2<174:APDC-I>2.0.ZU;2-I
Abstract
Motivation: Decomposing each protein into modular domains is a basic p rerequisite to classify accurately structural units in biological mole cules. Boundaries between domains are indicated by two similar- amino acid sequence segments located within the same protein (repeats) ol wi thin homologous proteins at notably different distances from their res pective N- or C-termini. Results: We have developed an automated metho d that combines such positional constraints derived from various detec ted pairwise sequence similarities to delineate the modular organizati on of proteins. The procedure has been applied to a non-redundant data set of 26 990 proteins whose sequences were taken from the PIR and SW ISS-PROT databanks and shared <60% sequence identity amongst pairs. Th e resultant clustering, delineation and multiple alignment of 24 380 s equence fragments yielded a new database of 4364 domain families. Comp arison of the domain collection with that of PRODOM indicates a clear improvement in the number and size of domain families, domain boundari es and multiple sequence alignments. The accuracy and sensitivity of t he method are illustrated by results obtained for ankyrin-like repeats and EGF-like modules.