ITA
ENG

Large deviations-based upper bounds on the expected relative length of longest common subsequences

Authors

Hauser, Raphael Martínez, Servet Matzinger, Heinrich

Citation

Hauser, Raphael et al., Large deviations-based upper bounds on the expected relative length of longest common subsequences, Advances in applied probability , 38(2), 2006, pp. 827-852

Journal title

Advances in applied probability → ACNP

ISSN journal

00018678

Volume

Issue

Year of publication

2006

Pages

827 - 852

Database

ACNP

SICI code

Abstract

Consider the random variable Ln defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvátal and Sankoff showed that the limit .=limn..E[Ln]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how large deviation theory can be used to derive a consistent sequence of upper bounds, (qm)m.., on ., and how Monte Carlo simulation can be used in theory to compute estimates, q.m, of the qm such that, for given . > 0 and . . (0,1), we have P[. < q. < . + .] . .. In other words, with high probability the result is an upper bound that approximates . to high precision. We establish O((1 . .).1..(4+.)) as a theoretical upper bound on the complexity of computing q.m to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior.