Artificial Intelligence Review 18: 77–95, 2002.
© 2002 Kluwer Academic Publishers. Printed in the Netherlands.
77
A Perspective View and Survey of Meta-Learning
RICARDO VILALTA and YOUSSEF DRISSI
IBM T.J. Watson Research Center, 30 Saw Mill River Rd, Hawthorne NY, 10532 USA
(E-mail: vilalta@us.ibm.com, youseffd@us.ibm.com)
Abstract. Different researchers hold different views of what the term meta-learning exactly
means. The first part of this paper provides our own perspective view in which the goal is
to build self-adaptive learners (i.e. learning algorithms that improve their bias dynamically
through experience by accumulating meta-knowledge). The second part provides a survey of
meta-learning as reported by the machine-learning literature. We find that, despite different
views and research lines, a question remains constant: how can we exploit knowledge about
learning (i.e. meta-knowledge) to improve the performance of learning algorithms? Clearly
the answer to this question is key to the advancement of the field and continues being the
subject of intensive research.
Keywords: classification, inductive learning, meta-knowledge
1. Introduction
Meta-learning studies how learning systems can increase in efficiency
through experience; the goal is to understand how learning itself can become
flexible according to the domain or task under study. All learning systems
work by adapting to a specific environment, which reduces to imposing a
partial ordering or bias on the set of possible hypotheses explaining a concept
(Mitchell (1980)). Meta-learning differs from base-learning in the scope of
the level of adaptation: meta-learning studies how to choose the right bias
dynamically, as opposed to base-learning where the bias is fixed a priori, or
user parameterized. In a typical inductive-learning scenario, applying a base-
learner (e.g. decision tree, neural network, or support vector machine) over
some data produces a hypothesis that depends on the fixed bias embedded
in the learner. Learning takes place at the base-level because the quality of
the hypothesis normally improves with an increasing number of examples.
Nevertheless, successive applications of the learner over the same data always
produces the same hypothesis, independently of performance; no knowledge
is extracted across domains or tasks (Pratt and Thrun (1997)).
Meta-learning aims at discovering ways to dynamically search for the
best learning strategy as the number of tasks increases (Thrun (1998), Rende
et al. (l987B)). A computer program qualifies as a learning machine if its