DRCFS: Doubly Robust Causal Feature Selection Francesco Quinzan ∗† 1 , Ashkan Soleymani † 2 , Patrik Jaillet 2 , Cristian R. Rojas 3 , Stefan Bauer 4,5 1 Department of Computer Science, University of Oxford 2 Laboratory for Information & Decision Systems (LIDS), Massachusetts Institute of Technology 3 KTH Royal Institute of Technology 4 TU Munich 5 Helmholtz Munich † equal contribution Abstract Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS signiﬁcantly outperforms existing state- of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems. 1 Introduction We study the fundamental problem of causal feature selection for non-linear models. That is, consider a set of features X = {X 1 ,...,X m }, and an outcome Y speciﬁed with an additive- noise model on some of the features [Hoyer et al., 2008, Peters et al., 2014, Schölkopf et al., 2012]: Axiom (A) Y = f (Pa(Y )) + ε, for a subset Pa(Y ) ⊆ X and posterior additive noise ε. Our goal is to identify the set of relevant features Pa(Y ) from observations. Feature selection is an important cornerstone of high-dimensional data analysis [Bolón-Canedo et al., 2015, Butcher and Smith, 2020, Li et al., 2017, Liu and Motoda, 2007], especially in data-rich settings. By including only relevant variables and removing nuisance factors, feature selection allows us to build models that are simple, interpretable, and more robust [Janzing et al., 2020, Yu et al., 2020]. Moreover, in ∗ This work was initiated and partially done while the author was employed at KTH Royal Institute of Technology. 1 arXiv:2306.07024v3 [cs.LG] 5 Jul 2023