The detection and alignment of locally conserved regions (motifs) in m
ultiple sequences can provide insight into protein structure, function
, and evolution. A new Gibbs sampling algorithm is described that dete
cts motif-encoding regions in sequences and optimally partitions them
into distinct motif models; this is illustrated using a set of immunog
lobulin fold proteins. When applied to sequences sharing a single moti
f, the sampler can be used to classify motif regions into related subm
odels, as is illustrated using helix-turn-helix DNA-binding proteins.
Other statistically based procedures are described for searching a dat
abase for sequences matching motifs found by the sampler. When applied
to a set of 32 very distantly related bacterial integral outer membra
ne proteins, the sampler revealed that they share a subtle, repetitive
motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-4
10) fails to detect significant pairwise similarity between any of the
sequences, the repeats present in these outer membrane proteins, take
n as a whole, are highly significant (based on a generally applicable
statistical test for motifs described here). Analysis of bacterial por
ins with known trimeric beta-barrel structure and related proteins rev
eals a similar repetitive motif corresponding to alternating membrane-
spanning beta-strands. These beta-strands occur on the membrane interf
ace (as opposed to the trimeric interface) of the beta-barrel. The bro
ad conservation and structural location of these repeats suggests that
they play important functional roles.