Jc. Ison et al., Key residues approach to the definition of protein families and analysis of sparse family signatures, PROTEINS, 40(2), 2000, pp. 330-341
We extend the concept of the motif as a tool for characterizing protein fam
ilies and explore the feasibility of a sparse "motif" that is the length of
the protein sequence itself, The type of motif discussed is a sparse famil
y signature consisting of a set of N key residue positions (A1,A2...AN) pre
ceded by gaps (G) thus G1A1G2A2....GNAN. Both a residue and gap can be vari
able. A signature is matched to a protein sequence and scored using a dynam
ic programming algorithm which permits variability in gap distance and resi
due type. Generating a signature involves identifying residues associated w
ith points of contact in interactions between secondary structure elements.
A raw signature consists of a set of positions with potential key structur
al roles sampled from a sequence alignment constructed with reference to th
is contact data. Raw signatures are refined by sampling different gap-resid
ue pairs until the specificity of a signature for the family cannot be furt
her improved. We summarize signatures for nine families of protein of diver
se fold and function and present results of scans against the OWL protein s
equence database. The implications of such signatures are discussed. (C) 20
00 Wiley-Liss, Inc.