1 Generalized Flows for Optimal Inference in Higher Order MRF-MAP Chetan Arora Subhashis Banerjee Prem Kalra S.N. Maheshwari Abstract—Use of higher order clique potentials in MRF-MAP problems has been limited primarily because of the inefficiencies of the existing algorithmic schemes. We propose a new combinatorial algorithm for computing optimal solutions to 2 label MRF- MAP problems with higher order clique potentials. The algorithm runs in time O(2 k n 3 ) in the worst case (k is size of clique and n is the number of pixels). A special gadget is introduced to model flows in a higher order clique and a technique for building a flow graph is specified. Based on the primal dual structure of the optimization problem, the notions of the capacity of an edge and a cut are generalized to define a flow problem. We show that in this flow graph, when the clique potentials are submodular, the max flow is equal to the min cut, which also is the optimal solution to the problem. We show experimentally that our algorithm provides significantly better solutions in practice and is hundreds of times faster than solution schemes like Dual Decomposition [1], TRWS [2] and Reduction [3], [4], [5]. The framework represents a significant advance in handling higher order problems making optimal inference practical for medium sized cliques. Index Terms—Markov Random Field (MRF), Maximum a posteriori (MAP), Higher Order Cliques, Optimal Inference 1 I NTRODUCTION M ANY problems in computer vision, statistical mechanics, natural language processing, pro- tein chain placements etc. can be formulated as com- putation of minimum energy configurations. Histori- cally, the first formulation of the energy minimization in the context of labeling problems in computer vi- sion is due to Geman and Geman [6]. Assuming the labeling to be a Markov Random Field (MRF), finding a labeling configuration with Maximum a Posteriori Probability (MAP) can be formulated as: E(l P ) = min l P p∈P D p (l p )+ c∈C W c (l c ) , (1) where l p denotes the label at pixel p.A clique, defined as the set of pixels whose labels are contextually dependent on each other, is denoted as c. l c denotes a labeling configuration on clique c. C denotes the set of all cliques c. The first term, D p (l p ), also called the data energy or the data term, measures the cost of assigning label l p to p. The term measures how good is the labeling with respect to the observed data. The second term, W c (l c ), called the prior energy, measures the cost of the labeling configuration l c of a clique c depending on how consistent the labeling is with respect to the prior knowledge. The penalty function, W c (·), is also called the clique potential function. The formulation as described in Eq. 1 is often referred to as MRF-MAP. Over the last two decades computer vision re- searchers have focused on both MRF-MAP based modeling and algorithms for optimizing the resultant energy functions. Vision problems that have been formulated in the MRF-MAP framework have ranged from image restoration [6], segmentation of images [7], super resolution [8], stereo matching [9] to ob- ject detection [10]. Research in algorithmic techniques has been influenced largely by the observation that while the general MRF-MAP optimization problem is NP-Hard, for 2-label 2-clique submodular potentials, the optimization problems have strongly polynomial time optimal algorithms [11]. This has initiated a new research area in which the focus has been to extend the class of energy functions for which either there are efficient optimal solutions or there are sub- optimal solutions with well defined approximation guarantees. Submodular potentials are of particular interest because while real life problems involve non- submodular potentials, combinatorial techniques for handling non-submodularity so far have involved making some form of submodular approximation. Our focus in this paper is on MRF-MAP labeling problems with 2-label and cliques of size more than 2. There have been essentially two lines of approach to deal with such problems. Message Passing or Decomposition Approaches: These techniques combine ideas from gradient based optimization [12], [13], belief propagation [14] or primal-dual based methodology of dual decomposition [15], [12], [16]. While convergence can in some cases be guaranteed for algorithms based on these ideas, it is only in the limit (if the algorithm is run for arbitrarily long time until convergence) and is not necessarily to the optimal solution. Reduction Based Approaches: These algorithms reduce the original problem to a sequence of 2-