IMPROVED SPARSE APPROXIMATION OVER QUASI-INCOHERENT DICTIONARIES A. C. Gilbert * S. Muthukrishnan † M. J. Strauss ‡ J. A. Tropp § ABSTRACT This paper discusses a new greedy algorithm for solving the sparse approximation problem over quasi-incoherent dictio- naries. These dictionaries consist of waveforms that are un- correlated “on average,” and they provide a natural general- ization of incoherent dictionaries. The algorithm provides strong guarantees on the quality of the approximations it produces, unlike most other methods for sparse approxima- tion. Moreover, very efficient implementations are possible via approximate nearest-neighbor data structures. 1. INTRODUCTION Sparse approximation is the problem of finding a concise representation of a given signal as a linear combination of a few elementary signals chosen from a rich collection. It has shown empirical promise in image processing tasks such as feature extraction, because the approximation cannot suc- ceed unless it discovers structure latent in the image. For example, Starck, Donoho and Cand` es have used sparse ap- proximation to extract features from noisy astronomical pho- tographs and volumetric data [1]. Nevertheless, it has been difficult to establish that proposed algorithms actually solve the sparse approximation problem. This paper makes an- other step in that direction by describing a greedy algorithm that computes solutions with provable quality guarantees. A dictionary D for the signal space R d is a collection of vectors that spans the entire space. The vectors are called atoms, and we write them as ϕ λ . The index λ may parame- terize the time/scale or time/frequency localization of each atom, or it may be a label without any additional meaning. The number of atoms is often much larger than the signal dimension. The sparse approximation problem with respect to D is to compute a good representation of each input signal as a short linear combination of atoms. Specifically, for an * AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932 agilbert@research.att.com. † AT&T Labs-Research & Rutgers University, 180 Park Avenue, Florham Park, NJ 07932 muthu@research.att.com. Supported in part by NSF CCR 00-87022 and NSF ITR 0220280. ‡ AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932 mstrauss@research.att.com. § ICES, University of Texas at Austin, Austin, TX 78712 jtropp@ices.utexas.edu. arbitrary signal x, we search for an m-term superposition a opt =  Λopt b λ ϕ λ which minimizes ‖x − a opt ‖ 2 . We must determine both the optimal vectors, m atoms whose indices are listed by Λ opt , as well as the optimal coefficients b λ . If D is an orthonormal basis, it is computationally easy to find a opt . For the indices Λ opt , simply take m atoms with the largest inner products |〈x, ϕ λ 〉| and form a opt =  Λopt 〈x, ϕ λ 〉 ϕ λ . Unfortunately, it can be difficult or impossible to choose an appropriate orthonormal basis for a given situation. For example, if the signals contain both harmonic and impulsive components, a single orthonormal basis will not represent them both efficiently. We have much more freedom with a redundant dictionary, since it may include a rich collection of waveforms which can provide concise representations of many different structures. The price that we pay for additional flexibility is an increased cost to determine these concise representations. For general redundant dictionaries, it is computationally in- feasible to search all possible m-term representations. In fact, if D is an arbitrary dictionary, finding the best m- term representation of an arbitrary signal is NP-hard [2]. There are algorithms with provable approximation guaran- tees for specific dictionaries, e.g. Villemoes’ algorithm for Haar wavelet packets [3]. There are also some well-known heuristics, such as Matching Pursuit (MP) [4], Orthogo- nal Matching Pursuit (OMP) [5] and m-fold Matching Pur- suit [6]. Several other methods rely on the Basis Pursuit paradigm, which advocates minimizing the ℓ 1 norm of the coefficients in the representation instead of minimizing the sparsity directly [7]. Some theoretical progress has already been made for dictionaries with low coherence. The coherence parame- ter µ equals the maximal inner product between two dis- tinct atoms. For example, the union of spikes and sines is a dictionary with µ =  2/d. The authors in [6] have pre- sented an efficient two-stage algorithm for the approximate representation of any signal over a sufficiently incoherent