An automated algorithm is presented that delineates protein sequence f
ragments which display similarity. The method incorporates a selection
of a number of local nonoverlapping sequence alignments with the high
est similarity scores and a graph-theoretical approach to elucidate th
e consistent start and end points of the fragments comprising one or m
ore ensembles of related subsequences. The procedure allows the simult
aneous identification of different types of repeats within one sequenc
e. A multiple alignment of the resulting fragments is performed and a
consensus sequence derived from the ensemble(s). Finally, a profile is
constructed from the multiple alignment to detect possible and more d
istant members within the sequence. The method tolerates mutations in
the repeats as well as insertions and deletions. The sequence spans be
tween the various repeats or repeat clusters may be of different lengt
hs. The technique has been applied to a number of proteins where the r
epeating fragments have been derived from information additional to th
e protein sequences. (C) 1993 Wiley-Liss, Inc.