Chebyshev polynomials, moment matching, and optimal estimation of the unseen

Citation
Yihong Wu et Pengkun Yang, Chebyshev polynomials, moment matching, and optimal estimation of the unseen, Annals of statistics , 47(2), 2019, pp. 857-883
Journal title
ISSN journal
00905364
Volume
47
Issue
2
Year of publication
2019
Pages
857 - 883
Database
ACNP
SICI code
Abstract
We consider the problem of estimating the support size of a discrete distribution whose minimum nonzero mass is at least 1k. Under the independent sampling model, we show that the sample complexity, that is, the minimal sample size to achieve an additive error of .k with probability at least 0.1 is within universal constant factors of klogklog21., which improves the state-of-the-art result of k.2logk in [In Advances in Neural Information Processing Systems (2013) 2157.2165]. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in O(n+log2k) time and attains the sample complexity within constant factors. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.