MetaBags: Bagged Meta-Decision Trees for Regression
Jihed Khiari
NEC Laboratories Europe
jihed.khiari@neclab.eu
Luis Moreira-Matias
NEC Laboratories Europe
luis.moreira.matias@gmail.com
Ammar Shaker
NEC Laboratories Europe
ammar.shaker@neclab.eu
Bernard Ženko
Josef Stefan Institute
bernard.zenko@ijs.si
Sašo Džeroski
NEC Laboratories Europe
saso.dzeroski@ijs.si
ABSTRACT
Ensembles are popular methods for solving practical supervised
learning problems. They reduce the risk of having underperforming
models in production-grade software. Although critical, methods
for learning heterogeneous regression ensembles have not been
proposed at large scale, whereas in classical ML literature, stacking,
cascading and voting are mostly restricted to classifcation problems.
Regression poses distinct learning challenges that may result in
poor performance, even when using well established homogeneous
ensemble schemas such as bagging or boosting.
In this paper, we introduce MetaBags, a novel, practically useful
stacking framework for regression. MetaBags is a meta-learning
algorithm that learns a set of meta-decision trees designed to select
one base model (i.e. expert ) for each query, and focuses on induc-
tive bias reduction. A set of meta-decision trees are learned using
diferent types of meta-features, specially created for this purpose.
Each meta-decision tree is learned on a diferent data bootstrap
sample, and, given a new example, selects a suitable base model that
computes a prediction. Finally, these predictions are aggregated
into a single prediction. This procedure is designed to learn a model
with a fair bias-variance trade-of, and its improvement over base
model performance is correlated with the prediction diversity of
diferent experts on specifc input space subregions. The proposed
method and meta-features are designed in such a way that they
enable good predictive performance even in subregions of space
which are not adequately represented in the available training data.
An exhaustive empirical testing of the method was performed,
evaluating both generalization error and scalability of the approach
on synthetic, open and real-world application datasets. The ob-
tained results show that our method signifcantly outperforms ex-
isting state-of-the-art approaches.
KEYWORDS
Stacking, Regression, Meta-Learning, Landmarking.
ACM Reference Format:
Jihed Khiari, Luis Moreira-Matias, Ammar Shaker, Bernard Ženko, and Sašo
Džeroski. 2018. MetaBags: Bagged Meta-Decision Trees for Regression. In
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
Submitted to ECML/PKDD 2018, ,
© 2018 Copyright held by the owner/author(s).
ACM ISBN 123-4567-24-567/08/06. . . $15.00
https://doi.org/10.475/123_4
Proceedings of (Submitted to ECML/PKDD 2018). ACM, New York, NY, USA,
Article 4, 11 pages. https://doi.org/10.475/123_4
1 INTRODUCTION
Ensemble refers to a collection of several models (i.e., experts) that
are combined to address a given task (e.g. obtain a lower gener-
alization error for supervised learning problems) [24]. Ensemble
learning can be divided in three diferent stages [24]: (i) base model
generation, where z multiple possible hypotheses
ˆ
f
i
(x ), i ∈{1..z } to
model a given phenomenon f (x ) = p( y |x ) are generated; (ii) model
pruning, where c ≤ z of those are kept and the others discarded;
and (iii) model integration, where these hypotheses are combined
to form the fnal one, i.e.
ˆ
F
(
ˆ
f
1
(x ), ...,
ˆ
f
c
(x )
)
. Naturally, the whole
process may require a large pool of computational resources for (i)
and/or large and representative training sets to avoid overftting,
since
ˆ
F is also estimated/learned on the (partial or full) training set,
which was already been used to train the base models
ˆ
f
i
(x ) in (i).
Since the pioneering Netfix competition in 2007 [1] and the coinci-
dent introduction of cloud-based solutions for data storing and/or
large-scale computing purposes, ensembles have been increasingly
often used for industrial applications. A good illustration of such
a trend is Kaggle, the popular competition website, where, during
the last fve years, 50+% of the winning solutions involved at least
one ensemble of multiple models [21].
Ensemble learning builds on the principles of committees, where
there is typically never a single expert that outperforms all the
others on each and every query. Instead, we may obtain a better
overall performance by combining answers of multiple experts [28].
Despite the importance of the combining function
ˆ
F for the success
of the ensemble, most of the recent research on ensemble learning
is either focused on (i) model generation and/or (ii) pruning [24].
We can group diferent approaches for model integration in three
clusters [30]: (a) voting (e.g. bagging [4]), (b) cascading [18] and (c)
stacking [32]. In voting, the outputs of the ensemble is a (weighted)
average of outputs of the base models. Cascading iteratively com-
bines the outputs of the base experts by including them, one of a
time, as yet another feature in the training set. Stacking learns a
meta-model that combines the outputs of all the base models. All
these approaches have advantages and shortcomings. Voting relies
on base models to have some complementary expertise
1
, which is
an assumption that is rarely true in practice (e.g. check Fig. 1-(b,c)).
1
Some base models perform reasonably well in some subregion of the feature space,
while other base models perform well in other regions.
arXiv:1804.06207v1 [cs.LG] 17 Apr 2018