Variational inference in a truncated Dirichlet process David M. Blei and Michael I. Jordan U.C. Berkeley December 4, 2003 1 The truncated Dirichlet process The N -component truncated Dirichlet process (DP N ) is defined in Ishwaran and James [2001] and converges almost surely to a true Dirichlet process (DP ∞ ). Like a full Dirichlet process, this distribution can be used as a nonparametric Bayesian prior in a mixture model. Ishwaran and James show that this approximation allows a blocking strategy in the corresponding Gibbs sampler which can be faster than the classical Gibbs samplers developed for the DP ∞ prior [Escobar and West, 1995]. In this paper, we develop a variational inference algorithm to approximate the posterior in a Bayesian mixture model with a DP N prior. An exponential family mixture model with DP N prior on the natural parameter of the mixture component is illustrated in Figure 2. The random variables are distributed as follows: p(V n | α) = Γ(1+α) Γ(α) (1 - V i ) α-1 for n ∈ [1,N - 1] p(V N = 1) = 1 p(η n | λ) = h(η n ) exp{λ 1 η n + λ 2 (-a(η n )) - a(λ)} p(K 1 d =1 | V) = V 1 p(K n d =1 | V) = (1 - V 1 )(1 - V 2 ) ··· (1 - V n-1 )V n for n ∈ [2,N ] p(X d | K d , η) = N n=1 ( h(X d ) exp{η T n X d - a(η n )} ) K n d Note that in the standard conjugate exponential set-up, λ has dimension dim(η) + 1 and -a(η) is the last component of the sufficient statistic of η. 2 Variational inference Consider the log likelihood of a dataset X = {X d } D d=1 : log p(X) = log p(V)p(η) D d=1 K p(K | V)p(X d | K) dVdη This quantity can be bounded with Jensen’s inequality as follows: log p(X) ≥E [log p(V | α)] + E [log p(η | λ)] + ∑ D d=1 E [log p(K d | V)] + E [log p(X d | K)] + H(q). (1) 1