I MPORTANCE S AMPLING Monday, November 17, 2008 Rice University STAT 631 / ELEC 639: Graphical Models Instructor: Dr. Volkan CEVHER Scribe: Ryan E. Guerra (war@rice.edu) Tahira N. Saleem (ts4@rice.edu) Terrance D. Savitsky (tds1@rice.edu) 1 Motivation In machine learning and statistics, we’re often tasked with computing the expected value of a function f (x) with respect to a probability distribution p(x), where x ∈ R n . In many cases, the canonical technique of evaluating the integral  x f (x)p(x)dx, is intractable due to the nature or complexity of p(x). On one hand, if a cumulative distribution function is non-decreasing and easily invertible then we can draw samples from its distribution by using inverse transform sampling [1] where we map i.i.d. samples from U [0, 1] through the inverse CDF P -1 (y). If one wishes to draw samples from a multivariate Gaussian distribution, then the well-known Box-Muller method [1] will sufﬁce. On the other hand, many distributions are difﬁcult or impossible to invert, and in some cases a closed-form representation might not exist or be computationally intractable to obtain. This is a problem since ﬁnding expected values of functions is often a step in larger engineering problems or algorithms. Importance sampling is a discrete method for approximating I [f ]= E p [f (x)] by replacing p(x) with a similar, but easily sampled, distribution q(x) and then correcting for the error introduced by making this switch. It is generally cited as a Monte Carlo variance reduction technique in that it provides a framework for reducing the computational complexity of computing the expectation I [f ] while directly relating the complexity to decreased simulation variance. In this paper, we will try to develop the theory of importance sampling and highlight certain important properties to consider when utilizing the technique. In the ﬁnal section we present a MATLAB simulation with discussion. 2 Importance Sampling Consider a set of samples {x (i) } generated from p(x), a given probability distribution. Then the form of the expectation of f (x) under p in (1) can be approximated by the average of f (x) evaluated at those samples (2). For the sake of notation, we will be using I [f ] to represent the expectation throughout this discussion. I [f ]= E p [f (x)] =  f (x)p(x)dx (1) ≃ 1 N N  i=1 f (x (i) ) (2) As in the previous section, we are operating under the assumption that while we can evaluate the value of p(x) at a given x, we cannot easily draw the samples from the distribution needed for our estimate. To deal 1