J. Gracy et P. Argos, AUTOMATED PROTEIN-SEQUENCE DATABASE CLASSIFICATION - II - DELINEATIONOF DOMAIN BOUNDARIES FROM SEQUENCE SIMILARITIES, BIOINFORMATICS, 14(2), 1998, pp. 174-187
Citations number
17
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Motivation: Decomposing each protein into modular domains is a basic p
rerequisite to classify accurately structural units in biological mole
cules. Boundaries between domains are indicated by two similar- amino
acid sequence segments located within the same protein (repeats) ol wi
thin homologous proteins at notably different distances from their res
pective N- or C-termini. Results: We have developed an automated metho
d that combines such positional constraints derived from various detec
ted pairwise sequence similarities to delineate the modular organizati
on of proteins. The procedure has been applied to a non-redundant data
set of 26 990 proteins whose sequences were taken from the PIR and SW
ISS-PROT databanks and shared <60% sequence identity amongst pairs. Th
e resultant clustering, delineation and multiple alignment of 24 380 s
equence fragments yielded a new database of 4364 domain families. Comp
arison of the domain collection with that of PRODOM indicates a clear
improvement in the number and size of domain families, domain boundari
es and multiple sequence alignments. The accuracy and sensitivity of t
he method are illustrated by results obtained for ankyrin-like repeats
and EGF-like modules.