Bayesian Inference for Sparse Generalized Linear Models Matthias Seeger, Sebastian Gerwinn, and Matthias Bethge Max Planck Institute for Biological Cybernetics Spemannstr. 38, T¨ ubingen, Germany Abstract. We present a framework for efficient, accurate approximate Bayesian inference in generalized linear models (GLMs), based on the ex- pectation propagation (EP) technique. The parameters can be endowed with a factorizing prior distribution, encoding properties such as sparsity or non-negativity. The central role of posterior log-concavity in Bayesian GLMs is emphasized and related to stability issues in EP. In particular, we use our technique to infer the parameters of a point process model for neuronal spiking data from multiple electrodes, demonstrating sig- nificantly superior predictive performance when a sparsity assumption is enforced via a Laplace prior distribution. 1 Introduction The framework of generalized linear models (GLM) [5] is a cornerstone of modern Statistics, offering unified estimation and prediction methods for a large num- ber of models frequently used in Machine Learning. In a Bayesian generalized linear model (B-GLM), assumptions about the model parameters (sparsity, non- negativity, etc) are encoded in a prior distribution. For example, it is common to use an overparameterized model with many features together with a sparsity prior. Only such features relevant for describing the data will end up having significant weight under the Bayesian posterior. Importantly, for the models of interest in this paper, inference does not require combinatorial computational efforts, but can be done even with a large number of parameters. Exact Bayesian inference is not analytically tractable in most B-GLMs. In this paper, we show how to employ the expectation propagation (EP) technique for approximate inference in GLMs with factorizing prior distributions. We focus on models with log-concave (therefore unimodal) posterior, for which a careful EP implementation is numerically robust and tends to convergence rapidly to an accurate posterior approximation. The code used in our experiments will be made publicly available. We apply our technique to a point process model for neuronal spiking data from multiple electrodes. Here, each neuron is assumed to receive causal input from an external stimulus and the spike history, represented by features in a GLM. In the presence of high-dimensional stimuli (such as images), with many neurons recorded at a reasonable time resolution, we end up with a lot of features,