We present in this paper a peptide matching approach to the multiple c
omparison of a set of protein sequences. This approach consists in loo
king for all the words that are common to q of these sequences, where
q is a parameter. The comparison between words is done by using as ref
erence an object called a model, In the case of proteins, a model is a
product of subsets of the alphabet Sigma of the amino acids. These su
bsets belong to a cover of Sigma, that is, their union covers all of S
igma. A word is said to be an instance of a model if it belongs to the
model, A further flexibility is introduced in the comparison by allow
ing for up to e errors in the comparison between a word and a model. T
hese errors may concern gaps or substitutions not allowed by the cover
. A word is said to be this time an occurrence of a model if the Leven
shtein distance between it and an instance of the model is inferior or
equal to e. This corresponds to what we call a Set-Levenshtein distan
ce between the occurrences and the model itself. Two words are said to
be similar if there is at least one model of which both are occurrenc
es. In the special case where e = 0, the occurrences of a model are si
mply its instances, If a model M has occurrences in at least q of the
sequences of the set, M is said to occur in the set. The algorithm pre
sented here is an efficient and exact way of looking for all the model
s, of a fixed length k or of the greatest possible length k(max), that
occur in a set of sequences. It is linear in the total length n of th
e sequences and proportional to (e + 2)(2e + 1)k(e + 1)p(e + 1)g(k) wh
ere k much less than n is a small value in all practical situations, p
is the number of sets in the cover and g is related to the latter's d
egree of nontransitivity. Models are closely related to what is called
a consensus in the biocomputing area, and covers are a good way of re
presenting complex relationships between the amino acids.