ITA
ENG

Error bounds on multivariate Normal approximations for word count statistics

Authors

Huang, Haiyan

Citation

Huang, Haiyan, Error bounds on multivariate Normal approximations for word count statistics, Advances in applied probability , 34(3), 2002, pp. 559-586

Journal title

Advances in applied probability → ACNP

ISSN journal

00018678

Volume

Issue

Year of publication

2002

Pages

559 - 586

Database

ACNP

SICI code

Abstract

Given a sequence S and a collection . of d words, it is of interest in many applications to characterize the multivariate distribution of the vector of counts U = (N(S,w1), ., N(S,wd)), where N(S,w) is the number of times a word w . . appears in the sequence S. We obtain an explicit bound on the error made when approximating the multivariate distribution of U by the normal distribution, when the underlying sequence is i.i.d. or first-order stationary Markov over a finite alphabet. When the limiting covariance matrix of U is nonsingular, the error bounds decay at rate O((log n) / .n) in the i.i.d. case and O((log n)3 / .n) in the Markov case. In order for U to have a nondegenerate covariance matrix, it is necessary and sufficient that the counted word set . is not full, that is, that . is not the collection of all possible words of some length k over the given finite alphabet. To supply the bounds on the error, we use a version of Stein's method.