K. Quandt et al., MATIND AND MATINSPECTOR - NEW FAST AND VERSATILE TOOLS FOR DETECTION OF CONSENSUS MATCHES IN NUCLEOTIDE-SEQUENCE DATA, Nucleic acids research, 23(23), 1995, pp. 4878-4884
The identification of potential regulatory motifs in new sequence data
is increasingly important for experimental design. Those motifs are c
ommonly located by matches to IUPAC strings derived from consensus seq
uences. Although this method is simple and widely used, a major drawba
ck of IUPAC strings is that they necessarily remove much of the inform
ation originally present in the set of sequences. Nucleotide distribut
ion matrices retain most of the information and are thus better suited
to evaluate new potential sites. However, sufficiently large librarie
s of pre-compiled matrices are a prerequisite for practical applicatio
n of any matrix-based approach and are just beginning to emerge. Here
we present a set of tools for molecular biologists that allows generat
ion of new matrices and detection of potential sequence matches by aut
omatic searches with a library of pre-compiled matrices. We also suppl
y a large library (>200) of transcription factor binding site matrices
that has been compiled on the basis of published matrices as well as
entries from the TRANSFAC database, with emphasis on sequences with ex
perimentally verified binding capacity. Our search method includes pos
ition weighting of the matrices based on the information content of in
dividual positions and calculates a relative matrix similarity. We sho
w several examples suggesting that this matrix similarity is useful in
estimating the functional potential of matrix matches and thus provid
es a valuable basis for designing appropriate experiments.