ITA
ENG

Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions

Authors

Dominik Rothenhäusler Peter Bühlmann Nicolai Meinshausen

Citation

Dominik Rothenhäusler et al., Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions, Annals of statistics , 47(3), 2019, pp. 1688-1722

Journal title

Annals of statistics → ACNP

ISSN journal

00905364

Volume

Issue

Year of publication

2019

Pages

1688 - 1722

Database

ACNP

SICI code

Abstract

Causal inference is known to be very challenging when only observational data are available. Randomized experiments are often costly and impractical and in instrumental variable regression the number of instruments has to exceed the number of causal predictors. It was recently shown in Peters, Bühlmann and Meinshausen (2016) (J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 947.1012) that causal inference for the full model is possible when data from distinct observational environments are available, exploiting that the conditional distribution of a response variable is invariant under the correct causal model. Two shortcomings of such an approach are the high computational effort for large-scale data and the assumed absence of hidden confounders. Here, we show that these two shortcomings can be addressed if one is willing to make a more restrictive assumption on the type of interventions that generate different environments. Thereby, we look at a different notion of invariance, namely inner-product invariance. By avoiding a computationally cumbersome reverse-engineering approach such as in Peters, Bühlmann and Meinshausen (2016), it allows for large-scale causal inference in linear structural equation models. We discuss identifiability conditions for the causal parameter and derive asymptotic confidence intervals in the low-dimensional setting. In the case of nonidentifiability, we show that the solution set of causal Dantzig has predictive guarantees under certain interventions. We derive finite-sample bounds in the high-dimensional setting and investigate its performance on simulated datasets.