IMAGE DENOISING VIA ADJUSTMENT OF WAVELET COEFFICIENT MAGNITUDE CORRELATION Javier Portilla and Eero P. Simoncelli Center for Neural Science, and Courant Institute of Mathematical Sciences New York University, NY 10003 {javier,eero}@cns.nyu.edu Published in: Proceedings of the 7th International Conference on Image Processing, Vancouver, BC, Canada. 10-13 September 2000. c  IEEE Computer Society. We describe a novel method of removing additive white noise of known variance from photographic images. The method is based on a characterization of statistical prop- erties of natural images represented in a complex wavelet decomposition. Speciﬁcally, we decompose the noisy image into wavelet subbands, estimate the autocorre- lation of both the noise-free raw coeﬃcients and their magnitudes within each subband, impose these statistics by projecting onto the space of images having the de- sired autocorrelations, and reconstruct an image from the modiﬁed wavelet coeﬃcients. This process is ap- plied repeatedly, and can be accelerated to produce opti- mal results in only a few iterations. Denoising results compare favorably to three reference methods, both per- ceptually and in terms of mean squared error. The set of natural images ﬁll a very small fraction of the space of all possible images. Modeling the prop- erties of this set has an enormous importance for many image processing tasks, such as compression or denois- ing. Typically, such models are statistical, and make use of simplifying assumptions such as stationarity, and spatial localization (e.g., Markov random ﬁelds). In this paper, we consider an image to be a sam- ple of a random ﬁeld that is parameterized by a small set of statistics. In a very high-dimensional space such as that of all digitized images, the samples of such a random ﬁeld lie close to the hypersurface of images sharing the same sample statistics, and we can approx- imate the probability density as a uniform distribution over this hypersurface [13]. Thus, assuming the random ﬁeld parameters are known, the statistical description is replaced by a deterministic one. In this context, an image corrupted by noise is an N -dimensional vector (N the number of pixels) that has been displaced from its original position to a point outside of its associated JP is supported by a fellowship from the Programa Nacional de FPI (Spanish Government). EPS is supported by NSF CA- REER grant MIP-9796040. hypersurface. The problem of estimating the original (noise-free) image involves ﬁrst estimating the parame- ters of the hypersurface, and then ﬁnding an image on the hypersurface that is close to the observed (noisy) image. Speciﬁcally, if one assumes the corrupting noise is Gaussian and white, then the maximum a posteriori (MAP) estimate corresponds to choosing the image on the hypersurface that is closest (in a Euclidean sense) to the observed image. Our method realizes an approx- imation to this estimate. 1. IMAGE REPRESENTATION Our model is based on a set of measurements on the co- eﬃcients of a multi-scale multi-orientation image rep- resentation known as a steerable pyramid [10]. This representation performs a local spectral decomposition of the image using oriented bandpass, self-similar ker- nels, roughly one octave in bandwidth. It exhibits a number of desirable mathematical properties (it is a tight frame, with translation- and rotation-invariant subbands), and has been used successfully in a num- ber of image processing problems, including noise re- moval [9]. The use of this representation is also mo- tivated by our knowledge of mammalian visual sys- tems, in which cortical neurons perform a decomposi- tion of the visual input using localized oriented recep- tive ﬁelds. Since human vision is the ultimate criterion of the quality of our processed images, it is desirable to use a set of visually relevant measurements. Re- cently, we have developed extensions of the steerable pyramid to utilize complex basis kernels, in which the real and imaginary parts are in quadrature phase [7]. Quadrature-pair subbands can be used to detect local features of the image, such as lines and edges, in a spa- tially shift-invariant way, and they have been widely used both for modeling complex cells in the visual cor- tex, and in local energy/phase models by the computer vision community. For the results of this paper, we