Variational EM Algorithms for Correlated Topic Models Mohammad Emtiyaz Khan and Guillaume Bouchard September 14, 2009 Abstract In this note, we derive a variational EM algorithm for correlated topic models. This algorithm was proposed in Blei and Lafferty’s original paper [BL06] and is based on a simple bound on logarithm. Because of the form of this bound, E-step update are not available in closed form and need to be solved with a coordinate ascent algorithm. 1 Correlated Topic Model Consider D number of documents with W words each. These words belong to a fixed vocabulary of size V . Let us say that there are T topics. The correlated topic model is a generative model for documents and is given as follows, p(η d |μ, Σ)= N (η d |μ, Σ) (1) p(z n,d |η d ) = Mult(f (η d )) (2) p(w n,d |z n,d , β 1:T ) = Mult(β z n,d ) (3) where f (a)= e a / ∑ j e a j . Basically we sample probability vector for each topic using a logistic-normal distribution. Next using this probability vector we sample a topic for each word. Depending on the topic, words are then generated from a fixed probability distribution. We are interested in finding similarity between the topics and a clustering of words based on the topics. We use the following notation in the following: we denote vectors with small bold letters (e.g. a) and matrices with capital bold letters (e.g. A). For scalars we use both small/capital plain faced letters. We use t =1,...,T as an index over topics, v =1,...,V as an index over words in the vocabulary, d =1,...,D as an index over documents, and n =1,...,W d as an index over words in d th document. The joint-distribution is the following, D d=1 p(w d , z d , η d |μ, Σ, B)= D d=1 W d n=1 p(w n,d |z n,d , B)p(z n,d |η d ) p(η d |μ, Σ) (4) where B =[β 1 , β 2 ,..., β T ]. Our goal is to infer the posterior distribution over η 1:D given the data. Also we wish to estimate the parameters Θ = {μ, Σ, B}. We will take an empirical Bayes approach to estimate the parameters i.e. we maximize the marginal likelihood with respect to the parameters. The marginal likelihood of the data given parameters Θ can be found as follows, p(w1:D|Θ)= d η d p(w d |η d , B)p(η d |μ, Σ)dη d (5) Unfortunately this integral is intractable and hence we will resort to the variational methods for opti- mization. We first find a lower bound for which the integral is tractable, and then maximize the lower bound with respect to the parameters. 1