HELM: Highly Efficient Learning of Mixed copula networks Yaniv Tenzer Department of Statistics The Hebrew University yaniv.tenzer@gmail.com Gal Elidan Department of Statistics The Hebrew University galel@huji.ac.il Abstract Learning the structure of probabilistic graphi- cal models for complex real-valued domains is a formidable computational challenge. This in- evitably leads to significant modelling compro- mises such as discretization or the use of a sim- plistic Gaussian representation. In this work we address the challenge of efficiently learning truly expressive copula-based networks that facilitate a mix of varied copula families within the same model. Our approach is based on a simple but powerful bivariate building block that is used to highly efficiently perform local model selection, thus bypassing much of computational burden in- volved in structure learning. We show how this building block can be used to learn general net- works and demonstrate its effectiveness on var- ied and sizeable real-life domains. Importantly, favorable identification and generalization per- formance come with dramatic runtime improve- ments. Indeed, the benefits are such that they allow us to tackle domains that are prohibitive when using a standard learning approaches. 1 INTRODUCTION Probabilistic graphical models [Pearl, 1988] in general and Bayesian networks (BNs) in particular, have become popu- lar as a flexible and intuitive framework for modeling mul- tivariate densities, a central goal of the data sciences. At the heart of the formalism is a combination of a qualitative graph structure that encodes the regularities (independen- cies) of the domain and quantitative local conditional den- sities of a variable given its parents in the graph. The result is a decomposable model that facilitates relatively efficient inference and estimation. Unfortunately, learning the struc- ture of such models remains a formidable challenge, partic- ularly when dealing with real-valued domains that are non- Gaussian. The computational bottleneck lies in the need to assess the merit of many candidate structures, each requir- ing potentially costly maximum likelihood evaluation. The situation is further compounded in realistic domains where we also want to allow for the combination of differ- ent local representations within the same model. Specifi- cally, such a scenario requires that we perform non-trivial local model selection within an already challenging struc- ture learning procedure. In practice, with as few as tens of variables, learning any real-valued graphical model beyond the simple linear Gaussian BN can be computationally im- practical. At the same time, it is clear that, for many do- mains, the Gaussian representation is too restrictive. Our goal in this work is to overcome this barrier and to effi- ciently learn the structure of expressive networks that do not only go beyond the Gaussian, but that also allow for a mix of varied local representations. In the search for expressive representations, several recent works use copulas as a building block within the framework of graphical models [Kirshner, 2007, Elidan, 2010, Wil- son and Ghahramani, 2010]. Briefly, copulas [Joe, 1997, Nelsen, 2007] flexibly capture distributions of few dimen- sions: easy to estimate univariate marginals are joined to- gether using a copula function that focuses solely on the dependence pattern of the joint distribution. Appealingly, regardless of the dependency pattern, any univariate repre- sentation can be combined with any copula. In all of the above works, the resulting copula graphical model proved quite effective at capturing complex high-dimensional do- mains, far surpassing the Gaussian representation. Recently, Elidan [2012] proposed a structure learning method that is tailored to the so called copula network rep- resentation, and that is essentially as efficient as learning a simple linear Gaussian BN. However, an important draw- back of the approach is that it constrains all local copulas in the model to be of the same type. Tenzer and Elidan [2013] offer a slight improvement but their method is inherently limited to few (2-3) of specific local representations and to tree-structured networks. Clearly, to take advantage of the plethora of dependency patterns captured by different cop- ula families, we would like to have greater flexibility.