Prediction when fitting simple models to high-dimensional data

Citation
Lukas Steinberger et Hannes Leeb, Prediction when fitting simple models to high-dimensional data, Annals of statistics , 47(3), 2019, pp. 1408-1442
Journal title
ISSN journal
00905364
Volume
47
Issue
3
Year of publication
2019
Pages
1408 - 1442
Database
ACNP
SICI code
Abstract
We study linear subset regression in the context of a high-dimensional linear model. Consider y=.+..z+. with univariate response y and a d-vector of random regressors z, and a submodel where y is regressed on a set of p explanatory variables that are given by x=M.z, for some d.p matrix M. Here, .high-dimensional. means that the number d of available explanatory variables in the overall model is much larger than the number p of variables in the submodel. In this paper, we present Pinsker-type results for prediction of y given x. In particular, we show that the mean squared prediction error of the best linear predictor of y given x is close to the mean squared prediction error of the corresponding Bayes predictor E[y.x], provided only that p/logd is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from n independent observations of (y,x) is close to that of the Bayes predictor, provided only that both p/logd and p/n are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables z.