1 Using Sequential Context for Image Analysis Ant´ onio R. C. Paiva 1 , Elizabeth Jurrus 1,2 and Tolga Tasdizen 1,3 1 Scientiﬁc Computing and Imaging Institute, 2 School of Computing, and 3 Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, 84112 email: {arpaiva,liz,tolga}@sci.utah.edu Abstract—This paper proposes the sequential context inference (SCI) algorithm for Markov random ﬁeld (MRF) image analysis. This algorithm is designed primarily for fast inference on an MRF model, but its application requires also a speciﬁc modeling architecture. The architecture is composed of a sequence of stages, each modeling the conditional probability of the labels, conditioned on a neighborhood of the input image and output of the previous stage. By learning the model at each stage sequentially with regards to the true output labels, the stages learn different models which can cope with errors in the previous stage. Keywords-sequential context inference; Markov random ﬁelds; conditional random ﬁelds; neural networks. I. I NTRODUCTION Markov random ﬁeld (MRF) models have been widely used in statistical image analysis [1]. MRF models characterize the joint statistics of the observed image (or its derived features) and the latent parameters of the vision processes. By consid- ering the joint effect in the distribution by spatial neighbors, contextual information and other properties of early vision can be modeled in a convenient and consistent way, while using the Markov assumption to simplify the dependencies of the model. Inference on the learned MRF models, however, is not straightforward. Inference implies ﬁnding the maximum a posteriori (MAP) of the model. In general, exact computation through exhaustive search is infeasible due to the combina- torial nature of the search. Algorithms that take advantage of the factorization of the distribution, like belief propaga- tion (BP) [2], cannot be used in this case because they are limited to tree dependencies in the factor graph. The graphs associated with MRF models for images, however, have loops due to the interdependence between neighboring pixels. The typical solution is to resort to approximate computational methods such as loopy belief propagation [2], mean ﬁeld annealing [1], relaxation labeling [3], iterated conditioned modes (ICM) [4], and fast graph cut methods [5], [6]. These algorithms have several drawbacks, but their two major limi- tations are their high computational complexity and the need to iterate until convergence, which prevents their use in time- critical applications. Moreover, some algorithms are not even guaranteed to converge [7]. In this paper, the sequential context inference (SCI) algo- rithm for modeling and inference in MRF image analysis is proposed. The key difference is the use of a sequential archi- tecture of models. Each stage models the posterior distribution of the latent labels, conditioned on a neighborhood of the input image and estimated latent neighbors, utilizing the result of the previous stage as input in place of the values for the latent neighbors. By learning the model at each stage sequentially with regards to the true output label, the models learn to cope with errors in the previous stage. Two conceptually related approaches are stacked graphical learning (SGL) [8] and Tu’s auto-context [9]. Both of these methods build a dependency network of models that are used for inference, similar to the sequential architecture utilized by the SCI algorithm. SGL does inference using Gibbs sampling, since it was designed primarily to reduce the computation in Markov chains as the number of labels increases [8]. However, in image analysis, a more pressing problem is the intractability of inference in the presence of loops in the context estimation, which leads to the use of a different formulation for modeling and inference in the SCI algorithm. Auto-context is a boosting strategy that sequentially learns classiﬁers from local ﬁlters applied to the observed image and the result of the previous classiﬁer. Because a classiﬁer utilizes the result of the previ- ous classiﬁer, it effectively congregates information from an increasingly larger region in the input image, thus building context. However, the feature ﬁlters must be given a priori, or selected from a large ﬁlter bank of features. Thus, this approach is computationally expensive and may not achieve the best possible solution. In contrast, the models for the SCI algorithm learn the probability distribution, rather than a classiﬁer, directly from image samples, implicitly ﬁnding the relevant features. II. SEQUENTIAL CONTEXT I NFERENCE Consider an input image X = {x i : i ∈ Ω}, where x i denotes the feature vector for the ith pixel, and Ω is the image lattice. The neighborhood of the ith pixel of X is x i ≡{x j : j ∈N x i }. For modeling, a supervised learning strategy will be utilized. Hence, to be able to learn the models, we will consider that a corresponding label image, denoted Y = {y i : i ∈ Ω,y i ∈ L}, where L = {l 0 ,...,l C-1 } is the set of C possible labels, is available during training. Similarly, the neighborhood of the ith pixel in Y is denoted y i ≡{y j : j ∈N y i }, with N y i a neighborhood system such that i/ ∈N y i . Clearly, the neighborhood systems N x and N y , for X and Y , may be different. Speciﬁcally, note that the center pixel is required not to be included by N y .