We present a new method for the identification of conserved patterns i
n a set of unaligned related protein sequences. It is able to discover
patterns of a quite general form, allowing for both ambiguous positio
ns and for variable length wildcard regions. It allows the user to def
ine a class of patterns (e.g., the degree of ambiguity allowed and the
length and number of gaps), and the method is then guaranteed to find
the conserved patterns in this class scoring highest according to a s
ignificance measure defined. Identified patterns may be refined using
one of two new algorithms. We present a new (nonstatistical) significa
nce measure for flexible patterns. The method is shown to recover know
n motifs for PROSITE families and is also applied to some recently des
cribed families from the literature.