We present a language for describing structural patterns of residues in pro
tein structures and a method for the discovery of such patterns that recur
in a set of protein structures. The patterns impose restrictions on the spa
tial position of each residue, their order along the amino acid chain, and
which amino acids are allowed in each position. Unlike other methods for co
mparing sets of protein structures, our method is not based on the use of p
airwise structure comparisons which is often time consuming and can produce
inconsistent results. Instead, the method simultaneously takes into accoun
t information from all structures in the search for conserved structure pat
terns which are potential structure motifs, The method is based on describi
ng the spatial neighborhoods of each residue in each structure as a string
and applying a sequence pattern discovery method to find patterns common to
subsets of these strings. Finally it is checked whether the similarities b
etween the neighborhood strings correspond to spatially similar substructur
es. We apply the method to analyze sets of very disparate proteins from the
four different protein families: serine proteases, cuprodoxins, cysteine p
roteinases, and ferredoxins. The motifs found by the method correspond well
to the site and motif information given in the annotation of these protein
s in PDB, Swiss-Prot, and PROSITE, Furthermore, the motifs are confirmed by
using the motif data to constrain the structural alignment of the proteins
obtained with the program SAP. This gave the best superposition/alignment
of the proteins given the motif assignment. (C) 1999 Wiley-Liss, Inc.