Learning the Right Model: Efficient Max-Margin Learning in Laplacian CRFs Dhruv Batra TTI-Chicago dbatra@ttic.edu Ashutosh Saxena Cornell University asaxena@cs.cornell.edu Abstract An important modeling decision made while designing Conditional Random Fields (CRFs) is the choice of the po- tential functions over the cliques of variables. Laplacian potentials are useful because they are robust potentials and match image statistics better than Gaussians. Moreover, en- ergies with Laplacian terms remain convex, which simplifies inference. This makes Laplacian potentials an ideal model- ing choice for some applications. In this paper, we study max-margin parameter learning in CRFs with Laplacian potentials (LCRFs). We first show that structured hinge-loss [35] is non-convex for LCRFs and thus techniques used by previous works are not appli- cable. We then present the first approximate max-margin algorithm for LCRFs. Finally, we make our learning al- gorithm scalable in the number of training images by using dual-decomposition techniques. Our experiments on single- image depth estimation show that even with simple features, our approach achieves comparable to state-of-art results. 1. Introduction Undirected graphical models such as Markov Random Fields (MRFs) and Conditional Random Fields (CRFs) have been successfully applied to a number of vision prob- lems, such as image denoising, optical flow and single- image depth estimation. While designing an MRF/CRF for an application, especially one with continuous random vari- ables, an important modeling decision is the choice of the family of potential functions over the cliques of variables. In the context of natural images, this question has been studied as the search for suitable natural image pri- ors [36, 38]. Some of the earliest works [12] used quadratic disagreement pairwise potentials, corresponding to Gaus- sian priors on images. Since then however, a large body of work [21, 34, 36, 38] has found that histograms of fil- ter responses for natural images tend to be highly “non- Gaussian”, in that they have sharp peaks at zero and heavy tails. Consequently, recent works have focused on non- convex priors [2, 22, 23, 32, 36]. A similar situation holds for range images, i.e. images (2 (1 0 1 2 (6 (4 (2 0 Difference in Depth Log Probability Real Data Laplacian Fit Gaussian Fit Figure 1: Log10 of the normalized histogram of relative depths (between adjacent pixels) from 400 laser scans collected by Sax- ena et al.[24, 25]. Notice that the relative depths are better modeled by a Laplacian distribution than a Gaussian. A hyper- Laplacian would be a better fit but results in non-convex problems. captured by laser range-scanners as opposed to traditional cameras. Huang et al.[13] presented the first analysis of range images and found that log-gradient-histograms of range images of natural scenes were also heavy-tailed and peaked at zero. More recently, Saxena et al.[24, 25] made similar observations in the context of monocular depth es- timation, and found that relative depths are better modeled by a Laplacian distribution than a Gaussian. Model. The model we consider is a CRF with Laplacian potentials, which we refer to as Laplacian CRF (LCRF) for ease of notation. Although non-convex models like Fields of Experts (FOE) [22, 36] or hyper-Laplacian priors [14] may be a better fit to natural statistics than LCRFs, there are a number of good reasons for using LCRFs. Laplacian potentials represent a sweet spot in the trade- off between the conflicting goals of modeling and optimiza- tion. Gaussian potentials lead to easy (inference and learn- ing) optimization problems, but are a poor match to im- age statistics. Non-convex models (e.g., FOE) match image statistics well but result in difficult (non-convex) optimiza- tion problems. Laplacian potentials are robust potentials and match image statistics better than Gaussians, yet ener- gies with Laplacian terms remain convex, which simplifies inference. Moreover, in recent work, Schmidt et al.[29] found that Laplacian models actually outperformed hyper- Laplacian models on the task of image restoration, when 1