Motivation: When analysing novel protein sequences, it is now essential to
extend search strategies to include a range of 'secondary' databases. Patte
rn databases have become vital tools for identifying distant relationships
in sequences, and hence for predicting protein function and structure. The
main drawback of such methods is the relatively small representation of pro
teins in trial samples at the time of their construction. Therefore, a nega
tive result of an amino acid sequence comparison with such a databank force
s a researcher to search for similarities in the original protein banks. We
developed a database of patterns constructed for groups of related protein
s with maximum representation of amino acid sequences of SWISS-PROT in the
groups.
Results: Software tools and a new method have been designed to construct pa
tterns of protein families. By using such method, a new version of databank
of protein family patterns, PROF_PAT 1.3, is produced. This bank is based
on SWISS-PROT (r1.38) and TrEMBL (r1.11), and contains patterns of more tha
n 13 000 groups of related proteins in a format similar to that of the PROS
ITE. Motifs of patterns, which had the minimum level of probability to be f
ound in random sequences, were selected. Flexible fast search program accom
panies the bank. The researcher can specify a similarity matrix (the type P
AM, BLOSUM and other). Variable levels of similarity can be set (permitting
search strategies ranging from exact matches to increasing levels of 'fuzz
iness').