A wealth of protein and DNA sequence data is being generated by genome
projects and other sequencing efforts. A crucial barrier to decipheri
ng these sequences and understanding the relations among them is the d
ifficulty of detecting subtle local residue patterns common to multipl
e sequences. Such patterns frequently reflect similar molecular struct
ures and biological properties. A mathematical definition of this ''lo
cal multiple alignment'' problem suitable for full computer automation
has been used to develop a new and sensitive algorithm, based on the
statistical method of iterative sampling. This algorithm finds an opti
mized local alignment model for N sequences in N-linear time, requirin
g only seconds on current workstations, and allows the simultaneous de
tection and optimization of multiple patterns and pattern repeats. The
method is illustrated as applied to helix-turn-helix proteins, lipoca
lins, and prenyltransferases.