Biased bootstrap methods for reducing the effects of contamination

Citation
P. Hall et B. Presnell, Biased bootstrap methods for reducing the effects of contamination, J ROY STA B, 61, 1999, pp. 661-680
Citations number
23
Categorie Soggetti
Mathematics
Journal title
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN journal
13697412 → ACNP
Volume
61
Year of publication
1999
Part
3
Pages
661 - 680
Database
ISI
SICI code
1369-7412(1999)61:<661:BBMFRT>2.0.ZU;2-W
Abstract
Contamination of a sampled distribution, for example by a heavy-tailed dist ribution, can degrade the performance of a statistical estimator. We sugges t a general approach to alleviating this problem, using a version of the we ighted bootstrap. The idea is to 'tilt' away from the contaminated distribu tion by a given (but arbitrary) amount, in a direction that minimizes a mea sure of the new distribution's dispersion. This theoretical proposal has a simple empirical version, which results in each data value being assigned a weight according to an assessment of its influence on dispersion. Importan tly, distance can be measured directly in terms of the likely level of cont amination, without reference to an empirical measure of scale. This makes t he procedure particularly attractive for use in multivariate problems. It h as several forms, depending on the definitions taken for dispersion and for distance between distributions. Examples of dispersion measures include va riance and generalizations based on high order moments. Practicable measure s of the distance between distributions may be based on power divergence, w hich includes Hellinger and Kullback-Leibler distances. The resulting locat ion estimator has a smooth, redescending influence curve and appears to avo id computational difficulties that are typically associated with redescendi ng estimators. Its breakdown point can be located at any desired value epsi lon is an element of (0, 1/2) simply by 'trimming' to a known distance (dep ending only on epsilon and the choice of distance measure) from the empiric al distribution. The estimator has an affine equivariant multivariate form. Further, the general method is applicable to a range of statistical proble ms, including regression.