Data-Dependent Norm Adaptation for Sparse Recovery in Kernel Ensembles Learning Marco Signoretto marco.signoretto@esat.kuleuven.be ESAT-SCD/SISTA Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven (BELGIUM) Kristiaan Pelckmans kp@it.uu.se Division of Systems and Control Department of Information Technology, Uppsala University Box 337 SE-751, 05 Uppsala (SWEDEN) Lieven De Lathauwer lieven.delathauwer@kuleuven-kortrijk.be Group Science, Engineering and Technology Katholieke Universiteit Leuven, Campus Kortrijk E. Sabbelaan 53, 8500 Kortrijk (BELGIUM) Johan A.K. Suykens johan.suykens@esat.kuleuven.be ESAT-SCD/SISTA Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven (BELGIUM) Editor: Abstract We study the problem of learning sparse nonlinear models given observations drawn from an unknown distribution. This problem can be recasted into the Multiple Kernel Learning (MKL) framework. It has also been studied in the context of functional ANOVA models, under the name of COmponent Selection and Smoothing Operator (COSSO). Our primary focus is on detecting the structure behind the data. We give an oracle-type inequality that relates sparse recovery with a measure of in-sample dependence. This result motivates a new type of data-dependent regularization that adapts to the observed dependence structure. The idea is then translated into an algorithm based on COSSO. Experimental results are provided to show the effectiveness of the proposed approach. 1. Introduction In real-life problems there is often need for unrevealing the structure behind the data. Functional ANOVA models have emerged as a class of structured models useful to capture nonlinear relations, while still providing insight in the model and dealing appropriately with the curse of dimensionality. The general principle is to approximate the underlying functional relation by an additive expansion, where components are functions on subsets of variables. This setting has been studied especially in nonparametric regression by smoothing splines (Wahba (1990); Gu (2002)). A special case is represented by additive models as defined in Buja et al. (1989), Hastie and Tibshirani (1990), which provide an extension of the 1