NON-CONVEX SPARSE OPTIMIZATION THROUGH DETERMINISTIC ANNEALING AND APPLICATIONS Luis Mancera * Dept. of Comp. Science and A.I. Universidad de Granada Granada, Spain mancera@decsai.ugr.es Javier Portilla Instituto de ´ Optica CSIC Madrid, Spain portilla@io.cfmac.csic.es We propose a new formulation to the sparse approximation problem for the case of tight frames which allows to minimize the cost func- tion using gradient descent. We obtain a generalized version of the iterative hard thresholding (IHT) algorithm, which provides locally optimal solutions. In addition, to avoid non-favorable minima we use an annealing technique consisting of gradually de-smoothing a previously smoothed version of the cost function. This results in de- creasing the threshold through the iterations, as some authors have already proposed as a heuristic. We have adapted and applied our method to restore images having localized information losses, such as missing pixels. We present high-performance in-painting results. Index Terms— Sparse approximation, ℓ0-norm minimization, in-painting. 1. INTRODUCTION Given an observed vector and a set of vectors defining a redundant dictionary, the sparse approximation problem can be stated as mini- mizing the vector’s approximation error using a linear combination of a given number of vectors from the dictionary. This problem seems to be very relevant for both living beings and artificial systems that analyze and process stimuli. It very likely plays a role in ver- bal communication, vision, and, in general, in tasks involving mixed source identification and efficient coding/synthesis. Most degrada- tion sources decrease the sparseness of the wavelet coefficients, and thus we can compensate for part of the degradation by finding sparse approximations to the observations (e.g., [1, 2]). The exact solution to this problem requires a combinatorial search. Some authors have explored more tractable variants. Greedy meth- ods approximate the image by incrementally selecting those vectors best describing the part not yet represented (e.g., [3, 4]). Also, if we minimize the sum of the absolute values of the coefficients (ℓ1-norm) instead of the number of active vectors (ℓ0-norm), then the optimiza- tion problem becomes convex (e.g., [5, 6]). Finally, iterative algo- rithms have been proposed, both for convex relaxation (as iterative soft thresholding method, IST, e.g. [7, 8]) and using hard threshold- ing, (iterative hard thresholding method, IHT, e.g. [9, 10]). They can be improved by some heuristics, like using decreasing thresholds. Here, we show that it is possible to conjugate classical opti- mization methods with competitive results without using convex or greedy approximations. We derive the IHT method through gradient descent on a continuous cost function equivalent to that of the sparse approximation problem. We then use a deterministic annealing-like ∗ Both authors funded by grant TEC2006/13845/TCM from the Ministerio de Ciencia y Tecnolog´ ıa, Spain. technique, through a homotopy [11] to avoid non-favorable local minima. We end up with a method already used as a heuristic (e.g., [9, 8]) but, up to our knowledge, we are first to propose a theoretically justified derivation. We have already shown [12] outstanding energy compaction performance of this method. Here we apply it to restora- tion, obtaining high-performance in-painting results. 2. THE SPARSE APPROXIMATION PROBLEM Let Φ be a N × M matrix with M>N and rank(Φ)= N . Then, for an image x ∈ R N , the problem Φa = x has infinite solutions in a ∈ R M . We look for compressible solutions, that is, vectors that can be represented as a sum of one vector having a small proportion of non-zero coefficients (sparse) plus a small correction vector. The following optimization is a popular way to obtain these solutions: ˆ a 0 (λ) = arg min a {‖a‖0 + λ‖Φa − x‖ 2 2 }, (1) where ‖a‖0 means the ℓ0-norm of a, (number of non-zero elements) and λ ∈ R ∗ balances the accuracy vs. the sparseness of the solution. 3. A GRADIENT-DESCENT APPROACH 3.1. Alternative continuous formulation Here we assume that Φ T is a Parseval frame, so ΦΦ T = I and, thus, ||Φ T x||2 = ||x||2, for all x ∈ R N . Under such condition, we prove next that Eq. (1) can be re-written as: (ˆ a, ˆ b) = arg min a,b {‖a‖0 + λ‖b − a‖ 2 2 s.t. Φb = x}. (2) We show that ˆ a =ˆ a 0 (λ). Firstly we express Eq. (2) as: ˆ a = arg min a {‖a‖0 + λ min b {‖b − a‖ 2 2 s.t. Φb = x}}. For a given a, the solution to the inner minimization yields ˜ b(a)= a + Φ T (Φa − x). Substituting in Eq. (2), and using the fact that Φ T is a Parseval frame, we obtain: ˆ a = arg min a {‖a‖0 + λ‖Φa − x‖ 2 2 } =ˆ a 0 (λ). . Now, we re-write Eq. (2) as: ˆ b = arg min b {min a {‖a‖0 + λ‖b − a‖ 2 2 } s.t. Φb = x}. (3)