ON THE CONDITIONAL DISTRIBUTIONS OF LOW-DIMENSIONAL PROJECTIONS FROM HIGH-DIMENSIONAL DATA

Authors
Citation
Hannes Leeb, ON THE CONDITIONAL DISTRIBUTIONS OF LOW-DIMENSIONAL PROJECTIONS FROM HIGH-DIMENSIONAL DATA, Annals of statistics , 41(2), 2013, pp. 464-483
Journal title
ISSN journal
00905364
Volume
41
Issue
2
Year of publication
2013
Pages
464 - 483
Database
ACNP
SICI code
Abstract
We study the conditional distribution of low-dimensional projections from high-dimensional data, where the conditioning is on other low-dimensional projections. To fix ideas, consider a random d-vector Z that has a Lebesgue density and that is standardized so that ..Z = 0 and ..ZZ. = I d . Moreover, consider two projections defined by unit-vectors . and ., namely a response y = ..Z and an explanatory variable x = ..Z. It has long been known that the conditional mean of y given x is approximately linear in x, under some regularity conditions; cf. Hall and Li [Ann. Statist. 21 (1993) 867.889]. However, a corresponding result for the conditional variance has not been available so far. We here show that the conditional variance of y given x is approximately constant in x (again, under some regularity conditions). These results hold uniformly in . and for most .'s, provided only that the dimension of Z is large. In that sense, we see that most linear submodels of a high-dimensional overall model are approximately correct. Our findings provide new insights in a variety of modeling scenarios. We discuss several examples, including sliced inverse regression, sliced average variance estimation, generalized linear models under potential link violation, and sparse linear modeling.