Large deviations-based upper bounds on the expected relative length of longest common subsequences

Citation
Hauser, Raphael et al., Large deviations-based upper bounds on the expected relative length of longest common subsequences, Advances in applied probability , 38(2), 2006, pp. 827-852
ISSN journal
00018678
Volume
38
Issue
2
Year of publication
2006
Pages
827 - 852
Database
ACNP
SICI code
Abstract
Consider the random variable Ln defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvátal and Sankoff showed that the limit .=limn..E[Ln]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how large deviation theory can be used to derive a consistent sequence of upper bounds, (qm)m.., on ., and how Monte Carlo simulation can be used in theory to compute estimates, q.m, of the qm such that, for given . > 0 and . . (0,1), we have P[. < q. < . + .] . .. In other words, with high probability the result is an upper bound that approximates . to high precision. We establish O((1 . .).1..(4+.)) as a theoretical upper bound on the complexity of computing q.m to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior.