A. Bolshoy et al., APPLICABILITY OF THE MULTIPLE ALIGNMENT ALGORITHM FOR DETECTION OF WEAK PATTERNS - PERIODICALLY DISTRIBUTED DNA PATTERN AS A STUDY CASE, Computer applications in the biosciences, 12(5), 1996, pp. 383-389
Motivation: A nucleosome DNA positioning pattern is known to be one of
the weakest (highly degenerated) patterns. The alignment procedure th
at has been developed recently for the extraction of such a pattern is
based on a statistical matching of the sequences, and its success dep
ends on the pattern/background ratio in the individual sequences and i
n the generated pattern. The heuristic nature of the method and distin
ctive properties of the pattern bring up the question of efficiency an
d sensitivity in the procedure. This paper presents a method of verifi
cation for this multiple sequence alignment algorithm. Results: To ver
ify the applicability of the multiple alignment approach, we construct
ed a set of sequences carrying the hidden pattern. The pattern was pre
sented by weak ('signal') oscillations of occurrences of AA and TT din
ucleotides along otherwise random sequences. Only a few dinucleotides
of any given 145 base long sequence would correspond to the signal, ap
pearing in about the same phase within the simulated periodic pattern.
The novelty of our simulation approach irs that we simulated a databa
se as a whole, as opposed to simulating each sequence separately. The
correlation between the hidden pattern and a sequence from the databas
e is negligible on average, but our statistical multicycle alignment p
rocedure produced the pattern with attributes very close to the simula
ted ones. The accuracy of the procedure was tested and calibrated. The
presence in a typical sequence of as little as three dinucleotides co
rresponding to the signal is sufficient to generate (detect) the patte
rn hidden in a collection of 204 sequences. Availability: The programs
of the multiple sequence alignment algorithm and database simulation
are available from the authors free of charge. Requests should be acco
mpanied by a 3.5 '' diskette.