1 Using Sequential Context for Image Analysis Ant´ onio R. C. Paiva 1 , Elizabeth Jurrus 1,2 and Tolga Tasdizen 1,3 1 Scientific Computing and Imaging Institute, 2 School of Computing, and 3 Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, 84112 email: {arpaiva,liz,tolga}@sci.utah.edu Abstract—This paper proposes the sequential context inference (SCI) algorithm for Markov random field (MRF) image analysis. This algorithm is designed primarily for fast inference on an MRF model, but its application requires also a specific modeling architecture. The architecture is composed of a sequence of stages, each modeling the conditional probability of the labels, conditioned on a neighborhood of the input image and output of the previous stage. By learning the model at each stage sequentially with regards to the true output labels, the stages learn different models which can cope with errors in the previous stage. Keywords-sequential context inference; Markov random fields; conditional random fields; neural networks. I. I NTRODUCTION Markov random field (MRF) models have been widely used in statistical image analysis [1]. MRF models characterize the joint statistics of the observed image (or its derived features) and the latent parameters of the vision processes. By consid- ering the joint effect in the distribution by spatial neighbors, contextual information and other properties of early vision can be modeled in a convenient and consistent way, while using the Markov assumption to simplify the dependencies of the model. Inference on the learned MRF models, however, is not straightforward. Inference implies finding the maximum a posteriori (MAP) of the model. In general, exact computation through exhaustive search is infeasible due to the combina- torial nature of the search. Algorithms that take advantage of the factorization of the distribution, like belief propaga- tion (BP) [2], cannot be used in this case because they are limited to tree dependencies in the factor graph. The graphs associated with MRF models for images, however, have loops due to the interdependence between neighboring pixels. The typical solution is to resort to approximate computational methods such as loopy belief propagation [2], mean field annealing [1], relaxation labeling [3], iterated conditioned modes (ICM) [4], and fast graph cut methods [5], [6]. These algorithms have several drawbacks, but their two major limi- tations are their high computational complexity and the need to iterate until convergence, which prevents their use in time- critical applications. Moreover, some algorithms are not even guaranteed to converge [7]. In this paper, the sequential context inference (SCI) algo- rithm for modeling and inference in MRF image analysis is proposed. The key difference is the use of a sequential archi- tecture of models. Each stage models the posterior distribution of the latent labels, conditioned on a neighborhood of the input image and estimated latent neighbors, utilizing the result of the previous stage as input in place of the values for the latent neighbors. By learning the model at each stage sequentially with regards to the true output label, the models learn to cope with errors in the previous stage. Two conceptually related approaches are stacked graphical learning (SGL) [8] and Tu’s auto-context [9]. Both of these methods build a dependency network of models that are used for inference, similar to the sequential architecture utilized by the SCI algorithm. SGL does inference using Gibbs sampling, since it was designed primarily to reduce the computation in Markov chains as the number of labels increases [8]. However, in image analysis, a more pressing problem is the intractability of inference in the presence of loops in the context estimation, which leads to the use of a different formulation for modeling and inference in the SCI algorithm. Auto-context is a boosting strategy that sequentially learns classifiers from local filters applied to the observed image and the result of the previous classifier. Because a classifier utilizes the result of the previ- ous classifier, it effectively congregates information from an increasingly larger region in the input image, thus building context. However, the feature filters must be given a priori, or selected from a large filter bank of features. Thus, this approach is computationally expensive and may not achieve the best possible solution. In contrast, the models for the SCI algorithm learn the probability distribution, rather than a classifier, directly from image samples, implicitly finding the relevant features. II. SEQUENTIAL CONTEXT I NFERENCE Consider an input image X = {x i : i Ω}, where x i denotes the feature vector for the ith pixel, and Ω is the image lattice. The neighborhood of the ith pixel of X is x i ≡{x j : j ∈N x i }. For modeling, a supervised learning strategy will be utilized. Hence, to be able to learn the models, we will consider that a corresponding label image, denoted Y = {y i : i Ω,y i ∈ L}, where L = {l 0 ,...,l C-1 } is the set of C possible labels, is available during training. Similarly, the neighborhood of the ith pixel in Y is denoted y i ≡{y j : j ∈N y i }, with N y i a neighborhood system such that i/ ∈N y i . Clearly, the neighborhood systems N x and N y , for X and Y , may be different. Specifically, note that the center pixel is required not to be included by N y .