APPLICABILITY OF THE MULTIPLE ALIGNMENT ALGORITHM FOR DETECTION OF WEAK PATTERNS - PERIODICALLY DISTRIBUTED DNA PATTERN AS A STUDY CASE

Citation
A. Bolshoy et al., APPLICABILITY OF THE MULTIPLE ALIGNMENT ALGORITHM FOR DETECTION OF WEAK PATTERNS - PERIODICALLY DISTRIBUTED DNA PATTERN AS A STUDY CASE, Computer applications in the biosciences, 12(5), 1996, pp. 383-389
Citations number
24
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
12
Issue
5
Year of publication
1996
Pages
383 - 389
Database
ISI
SICI code
0266-7061(1996)12:5<383:AOTMAA>2.0.ZU;2-G
Abstract
Motivation: A nucleosome DNA positioning pattern is known to be one of the weakest (highly degenerated) patterns. The alignment procedure th at has been developed recently for the extraction of such a pattern is based on a statistical matching of the sequences, and its success dep ends on the pattern/background ratio in the individual sequences and i n the generated pattern. The heuristic nature of the method and distin ctive properties of the pattern bring up the question of efficiency an d sensitivity in the procedure. This paper presents a method of verifi cation for this multiple sequence alignment algorithm. Results: To ver ify the applicability of the multiple alignment approach, we construct ed a set of sequences carrying the hidden pattern. The pattern was pre sented by weak ('signal') oscillations of occurrences of AA and TT din ucleotides along otherwise random sequences. Only a few dinucleotides of any given 145 base long sequence would correspond to the signal, ap pearing in about the same phase within the simulated periodic pattern. The novelty of our simulation approach irs that we simulated a databa se as a whole, as opposed to simulating each sequence separately. The correlation between the hidden pattern and a sequence from the databas e is negligible on average, but our statistical multicycle alignment p rocedure produced the pattern with attributes very close to the simula ted ones. The accuracy of the procedure was tested and calibrated. The presence in a typical sequence of as little as three dinucleotides co rresponding to the signal is sufficient to generate (detect) the patte rn hidden in a collection of 204 sequences. Availability: The programs of the multiple sequence alignment algorithm and database simulation are available from the authors free of charge. Requests should be acco mpanied by a 3.5 '' diskette.