ITA
ENG

SHUFFLING BIOLOGICAL SEQUENCES

Authors

KANDEL D MATIAS Y UNGER R WINKLER P

Citation

D. Kandel et al., SHUFFLING BIOLOGICAL SEQUENCES, Discrete applied mathematics, 71(1-3), 1996, pp. 171-185

Citations number

Categorie Soggetti

Mathematics,Mathematics

Journal title

Discrete applied mathematics → ACNP

Volume

Issue

1-3

Year of publication

1996

Pages

171 - 185

Database

ISI

SICI code

Abstract

This paper considers the following sequence shuffling problem: Given a biological sequence (either DNA or protein) a, generate a random inst ance among all the permutations of s that exhibit the same frequencies of k-lets (e.g. dinucleotides, doublets of amino acids, triplets, etc .). Since certain biases in the usage of k-lets are fundamental to bio logical sequences, effective generation of such sequences is essential for the evaluation of the results of many sequence analysis tools. Th is paper introduces two sequence shuffling algorithms: A simple swappi ng-based algorithm is shown to generate a near-random instance and app ears to work well, although its efficiency is unproven; a generation a lgorithm based on Euler tours is proven to produce a precisely uniform instance, and hence solve the sequence shuffling problem, in time not much more than linear in the sequence length.