Position-weight matrices (or profiles) are simple mathematical objects
traditionally used to capture the information about local sequence pa
tterns (or motifs) characteristic of a given structure or function. Al
though weight matrices can lead to fast database scanning algorithms t
heir usage has been limited, due to the lack of a reliable method to a
ssess the statistical significance of the matching scores. In this art
icle I first review 3 different computation scheme for designing weigh
t matrices from a block-alignment of any (small or large) number of se
quences. I then show that, for patterns spanning 10 positions or more,
the best scores expected from matching random sequences are distribut
ed according to the extreme value (Gumbel) distribution. The threshold
of statistical significance assessed from this distribution perfectly
delineate the range of scores characterizing ''true positive'' sequen
ces (biological significant matches). This result allows weight matric
es to be used to scan an entire protein database for patterns in a hig
hly sensitive way. MODEST (MOtif DEsign and Search Tools), a suite of
programs in Unix/C, implements these statistical improvements and is a
vailable upon E-mail request (jmc(a)ncbi.nlm.nih.gov).