Random Forest Prediction Intervals

Citation
Haozhe Zhang et al., Random Forest Prediction Intervals, American statistician , 74(4), 2020, pp. 392-406
Journal title
ISSN journal
00031305
Volume
74
Issue
4
Year of publication
2020
Pages
392 - 406
Database
ACNP
SICI code
Abstract
Random forests are among the most popular machine learning techniques for prediction problems. When using random forests to predict a quantitative response, an important but often overlooked challenge is the determination of prediction intervals that will contain an unobserved response value with a specified probability. We propose new random forest prediction intervals that are based on the empirical distribution of out-of-bag prediction errors. These intervals can be obtained as a by-product of a single random forest. Under regularity conditions, we prove that the proposed intervals have asymptotically correct coverage rates. Simulation studies and analysis of 60 real datasets are used to compare the finite-sample properties of the proposed intervals with quantile regression forests and recently proposed split conformal intervals. The results indicate that intervals constructed with our proposed method tend to be narrower than those of competing methods while still maintaining marginal coverage rates approximately equal to nominal levels.