ITA
ENG

AUTOMATED PROTEIN-SEQUENCE DATABASE CLASSIFICATION - II - DELINEATIONOF DOMAIN BOUNDARIES FROM SEQUENCE SIMILARITIES

Authors

GRACY J ARGOS P

Citation

J. Gracy et P. Argos, AUTOMATED PROTEIN-SEQUENCE DATABASE CLASSIFICATION - II - DELINEATIONOF DOMAIN BOUNDARIES FROM SEQUENCE SIMILARITIES, BIOINFORMATICS, 14(2), 1998, pp. 174-187

Citations number

Categorie Soggetti

Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods

Journal title

BIOINFORMATICS → ACNP

ISSN journal

13674803

Volume

Issue

Year of publication

1998

Pages

174 - 187

Database

ISI

SICI code

1367-4803(1998)14:2<174:APDC-I>2.0.ZU;2-I

Abstract

Motivation: Decomposing each protein into modular domains is a basic p rerequisite to classify accurately structural units in biological mole cules. Boundaries between domains are indicated by two similar- amino acid sequence segments located within the same protein (repeats) ol wi thin homologous proteins at notably different distances from their res pective N- or C-termini. Results: We have developed an automated metho d that combines such positional constraints derived from various detec ted pairwise sequence similarities to delineate the modular organizati on of proteins. The procedure has been applied to a non-redundant data set of 26 990 proteins whose sequences were taken from the PIR and SW ISS-PROT databanks and shared <60% sequence identity amongst pairs. Th e resultant clustering, delineation and multiple alignment of 24 380 s equence fragments yielded a new database of 4364 domain families. Comp arison of the domain collection with that of PRODOM indicates a clear improvement in the number and size of domain families, domain boundari es and multiple sequence alignments. The accuracy and sensitivity of t he method are illustrated by results obtained for ankyrin-like repeats and EGF-like modules.