PFAM - A COMPREHENSIVE DATABASE OF PROTEIN DOMAIN FAMILIES BASED ON SEED ALIGNMENTS

Citation
Ell. Sonnhammer et al., PFAM - A COMPREHENSIVE DATABASE OF PROTEIN DOMAIN FAMILIES BASED ON SEED ALIGNMENTS, Proteins, 28(3), 1997, pp. 405-420
Citations number
54
Categorie Soggetti
Biology
Journal title
ISSN journal
08873585
Volume
28
Issue
3
Year of publication
1997
Pages
405 - 420
Database
ISI
SICI code
0887-3585(1997)28:3<405:P-ACDO>2.0.ZU;2-S
Abstract
Databases of multiple sequence alignments are a valuable aid to protei n sequence classification and analysis, One of the main challenges whe n constructing such a database is to simultaneously satisfy the confli cting demands of completeness on the one hand and quality of alignment and domain definitions on the other, The latter properties are best d ealt with by manual approaches, whereas completeness in practice is on ly amenable to automatic methods, Herein we present a database based o n hidden Markov model profiles (HMMs), which combines high quality and completeness, Our database, Pfam, consists of parts A and B, Pfam-A i s curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually check ed seed alignments and HMMs to find and align all members, Pfam-B cont ains sequence families that were generated automatically by applying t he Domainer algorithm to cluster and align the remaining protein seque nces after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genom e project were classified, We have also identified many novel family m emberships in known proteins, including new kazal, Fibronectin type II I, and response regulator receiver domains, Pfam-A families have perma nent accession numbers and form a library of HMMs available for search ing and automatic annotation of new protein sequences. (C) 1997 Wiley- Liss, Inc.