Analysis of Approximate Message Passing Algorithm Arian Maleki Department of Electrical Engineering Stanford University arianm@stanford.edu David L. Donoho Department of Statistics Stanford University donoho@stanford.edu Andrea Montanari Department of Electrical Engineering and Department of Statistics Stanford University montanar@stanford.edu Abstract— Finding fast recovery algorithms is a problem of signiﬁcant interest in compressed sensing. In many applications, extremely large problem sizes are envisioned, with at least tens of thousands of equations and hundreds of thousands of unknowns. The interior point methods for solving the best studied algorithm -ℓ 1 minimization- are very slow and hence they are ineffective for such problem sizes. Faster methods such as LARS or homotopy [1] have been proposed for solving ℓ 1 minimization but there is still a need for algorithms with less computational complexity. Therefore there is a fast growing literature on greedy methods and iterative thresholding for ﬁnding the sparsest solution. It was shown by two of the authors that most of these algorithms perform worse than ℓ 1 when it comes to the sparsity-measurement tradeoff [2]. In a recent paper the authors introduced a new algorithm called AMP which is as fast as iterative soft thresholding algorithms proposed before and performs exactly the same as ℓ 1 minimization[3]. This algorithm is inspired from the message passing algorithms on bipartite graphs. The statistical properties of AMP let the authors propose a theoretical framework to analyze the asymptotic performance of the algorithm. This results in very sharp predictions of different observables in the algorithm. In this paper we address several questions regarding the thresholding policy. We consider a very general thresholding policy λ(σ) for the algorithm and will use the maxmin framework to tune it optimally. It is shown that when formulated in this general form, the maxmin thresholding policy is not unique and many different thresholding policies may lead to the same phase transition. We will then propose two very simple thresholding policies that can be implemented easily in practice and prove that they are both maxmin. This analysis will also shed some light on several other aspects of the algorithm, such as the least favorable distribution and similarity of all maxmin optimal thresholding policies. We will also show how one can derive the AMP algorithm from the full message passing algorithm. I. I NTRODUCTION Compressed sensing (CS) [4], [5] is a fairly new ﬁeld of research that has found many applications in signal processing, machine learning and biology. The main problem of interest in compressed sensing is to ﬁnd the sparsest solution of an underdetermined system of linear equations y = Axo. Unfortunately, this problem is NP-hard and in general there is no polynomial time algorithm to solve this problem. Chen et al. [6] proposed the following convex optimization for recovering the sparsest solution which is also called Basis Pursuit; min ‖x‖1 s.t. Ax = y. (1) The computational complexity of this minimization problem has motivated researchers to look for cheaper algorithms. Therefore many algorithms with different heuristics have been proposed in the last ﬁve years for ﬁnding the sparsest solutions. A class of methods that has drawn the most attention recently is the class of iterative thresholding algorithms. Starting with x 0 =0 and z 0 =0 the iteration is x t+1 = η(x t + A ∗ z t ; λ t ), z t = y − Ax t . (2) In this equation x t is the estimate of the signal at time t. z t is the residual at time t and if the problem is noiseless and the algorithm Fig. 1. The pictorial representation of achieving the sparsest solution for iterative thresholding algorithms performs well it will converge to 0. η is called thresholding function and is a scaler nonlinearity that is applied to the elements of the vector componentwise and ﬁnally λ t is the parameter that the thresholding function uses at time t. The main task of the thresholding function is to impose the sparsity at each iteration. Here we assume that the elements of the matrix A are scaled properly such that the iterations converge. If the nonlinearity or thresholding function η is removed from the algorithm it is well-known that the algorithm converges to the least ℓ2-norm solution of y = Ax. However by applying η at each iteration we direct the algorithm to the sparsest solution. This phenomena is shown in Figure 1. As it is clear from the ﬁgure the thresholding function moves the algorithm toward the sparsest solution, otherwise the algorithm moves in the direction perpendicular to the plane and hits the plane at a dense solution. Many different variants of iterative thresholding algorithms have been proposed in the literature from many different perspectives such as [7], [8], [9], [10],[11], [12], [13], [14], [15] and [16]. To see a more complete list of references please refer to [2]. Although many nonlinearities have been proposed in the literature, in this paper we focus on the soft thresholding which is given by η(x; λ) = sgn(x)(|x|− λ)+, (3) where the subscript (z)+ = zI(z ≥ 0). I is the indicator function. This algorithm is called iterative soft thresholding or IST. Although these algorithms are much faster than ℓ1, but it has been shown through extensive simulations that they perform worse than ℓ1 when it comes to the sparsity measurement tradeoff [2]. Recently the authors have proposed a slightly modiﬁed version of this algorithm which is capable of performing as well as ℓ1. This new algorithm