Automatic Bayesian Learning Methods Jo-Anne Ting 1 , Aaron D’Souza 2 , Evangelos Theodorou 1 , Stefan Schaal 1,3 1 University of Southern California, Los Angeles, CA, 90089 2 Google, Inc., Mountain View, CA, 94043 3 ATR Computational Neuroscience Laboratories, Kyoto 619-0288, Japan Email: joanneti@usc.edu , adsouza@google.com , etheodor@usc.edu , sschaal@usc.edu Abstract Our goal is to develop algorithms so that autonomous systems can learn to adapt to their changing environments without any interference from the user. In particular, we focus on data-rich and sensor-rich environments such as humanoid and biological systems. Bayesian inference, combined with approximation methods to reduce computational complexity, offers a principled way to eliminated open parameters, resulting in a black-box approach. We adopt a Bayesian approach to create learning algorithms that are easy to use over a variety of applications and require relatively little or no manual parameter tuning by the user (as in gradient descent or structure selection). That is to say, these algorithms should be as automatic as possible. We are interested in scenarios where the input data has hundreds to thousands of dimensions and where real-time, incremental learning may be needed, as, for example, in robotics. 1 Introduction We examine how to develop a toolbox of methods towards our goal of developing automatic Bayesian learning methods. Bayesian inference, combined with approximation methods to reduce computational complexity, offers a principled way to eliminate open parameters, resulting in a “black-box” approach. We are interested in scenarios where the input data has thousands of dimensions and where real-time, incremental learning may be needed (e.g., as in robotics). We start by examining the problem of regression, since classification can be performed by interpreting regression outputs in a binary way. First, we address the problem of linear regression in high- dimensional spaces, where input data may contain redundant and irrelevant dimensions and be contaminated with noise. Traditional linear regression produces biased estimates when input noise is present and suffers numerically when the data contains irrelevant and/or redundant inputs. We propose a Bayesian treatment of linear regression for optimal prediction and explore a Bayesian version of factor analysis regression for accurate parameter identification. Both methods are computationally efficient and require no parameter tuning. Since observed data from sensors often contain outliers, as well as noise, we investigate a Bayesian treatment of weighted linear regression that results in automatic outlier detection and is suitable for real-time implementation. We extend this idea to the Kalman filter in order to learn a Kalman filter that is robust to outliers in the observations. Again, both algorithms require no parameter tuning and are easy to use or incorporate in other methods. Current and future work considers the nonlinear high- dimensional regression problem, focusing on locally weighted learning methods. In particular, we are interested in developing a Bayesian approach to locally weighted regression that learns the bandwidth (the spatial distance metric) of each local model. Preliminary results indicate that a full Bayesian treatment of this problem can achieve impressive robust function approximation performance without the need for tuning meta parameters. We are also interested in extending this locally linear Bayesian model to an online setting, in the spirit of dynamic Bayesian networks, to offer to a parameter-free alternative to incremental learning. The development of these algorithms contributes to a toolbox of black-box methods that are easy to use, require little to no manual parameter tuning, and aid towards the development of a framework for automatic Bayesian learning systems. We demonstrate results on robotic platforms and neural data sets. In addition to showing significant improvements in speed and performance over standard methods, these algorithms are also capable of feature selection, noise clean-up, outlier detection and automatic bandwidth estimation in high-dimensional domains. 2 High-Dimensional Linear Regression In recent years, there has been growing interest in large scale analyses of brain activity with respect to associated behavioral variables. In the area of brain-machine interfaces, neural firing has been used to control an artificial system like a robot [1][1], to control a cursor on a computer screen via non-invasive brain signals [2] or to classify visual stimuli presented to a subject [4][5]. The brain signals are typically high dimensional, on the order of hundreds of inputs, with large numbers of redundant and irrelevant signals. Linear modeling techniques like