MapReduce Implementation of Variational Bayesian Probabilistic Matrix Factorization Algorithm Naveen C. Tewari, Hari M. Koduvely, Sarbendu Guha, Arun Yadav and Gladbin David Center for Knowledge Driven Intelligent Systems Infosys Labs, Infosys Limited, Electronics City Hosur Road, Bangalore - 560 100, INDIA Email: {Naveen Tewari, Harimanassery K, Sarbendu Guha, Arun Yadav, GladbinDavid C}@infosys.com Abstract—We introduce in this paper a scalable implementation of Variational Bayesian Matrix Factorization method for collaborative filtering using the MapReduce framework. Variational Bayesian methods have the advantage of providing good approximate analytical solutions for the posterior distribution. Due to the independence assumption about the parameters in the posterior distribution, variational methods are also likely to be able to parallelize efficiently. Though Variational Bayesian Matrix Factorization method has shown to produce more accurate results in collaborative filtering, its scaling properties have not studied so far. We ran our MapReduce implementation on the CiteULike data set and show that our parallelization scheme achieves approximately linear scaling. We also compare its performance with the MapReduce implementation of a popular matrix factorization algorithm, ALSWR, from the open source machine learning library Mahout. Index Terms—Variational Bayesian Matrix Factorization, Probabilistic Matrix Factorization, MapReduce, Distributed Computing, Collaborative Filtering, Recommendation Systems I. I NTRODUCTION Matrix factorization is a very popular technique used for Col- laborative Filtering to generate personalized recommendations in E- commerce. Its basic idea is to approximate the user-item transaction matrix X as a product of two low-rank matrices U and V X ≈ UV T (1) If the X matrix has dimensions M × N then U and V will have dimensions M × K and N × K respectively. Here K << M,N represents a set of hidden features characterizing the factors that drive consumer preferences. In E-commerce scenarios, such as major on- line retailers, typically M and N would be of the order of Millions. Also the matrix X would be highly sparse with only less than 10% filled by known values. This makes the above matrix factorization task highly non-trivial from the point of view of both computation time due to scale and accuracy due to over fitting. In the last decade, several machine learning methods have been developed for improving the accuracy of sparse matrix factorization. Well known examples are Weighted Non- negative Matrix Factoriza- tion (WNMF) [1], Alternate Least Square with Weighted Regular- ization (ALSWR) [2] and Variational Bayesian Matrix Factorization (VBMF) [3] to name a few. Hernndez-Lobato and Ghahramani have shown that VBMF, compared to other matrix factorization techniques, is more robust to over fitting, and generates more accurate predictions at the long tail in E-Commerce [4]. VBMF can also be used as a building block in most complex probabilistic models [5]. Though the variational Bayesian methods have been used in many contexts, use of variational approximation for scaling the machine learning models for modern day massive data sets have not been studied in detail. Nallapati et. al. have studied the speed and scalability of parallelized variational EM for LDA [6]. Zhai et. al. have implemented variational inference based LDA model using MapReduce [7]. In this paper the authors argue that variational methods are very flexible for MapRe- duce type of parallelization because under variational approximation the posterior distribution factorizes and hence one can compute the sufficient statistics for different parameters independently. To our knowledge there are no work reported on the MapReduce paral- lelization of VBMF. Implementation of VBMF using MapReduce parallelization scheme will make it useful for applying in commercial scenarios and also in understanding the scalability of variational machine learning methods in general. In this paper we describe the MapReduce implementation of Vari- ational Bayesian Matrix Factorization method, which is our original contribution. We apply our implementation on the CiteULike data set to study the scaling properties. We also compare its performance with the MapReduce implementation of ALSWR in the open source machine learning library Mahout 1 . Rest of the paper is organized as follows. In section II we will describe very briefly the VBMF method. Details of MapReduce implementation is given in section III. In section IV we describe the details of Hadoop infrastructure used for this study and approaches used for tuning the performance. In section V we present the results on the use of our MapReduce implementation of VBMF on CiteULike data set. Summary and conclusions from this work are presented in section VI. II. VARIATIONAL BAYESIAN PROBABILISTIC MATRIX FACTORIZATION Variational approximation is a relatively new approach to estimate the posterior distribution in a Bayesian inference problem [8],[9]. Here the idea is to propose an approximate posterior distribution which has a factorized form and is parameterized by a set of variational parameters. One then minimizes the Kullback-Leibler di- vergence between the variational distribution and the target posterior by using the variational parameters to arrive at a good approximate solution for the posterior distribution. Raiko et. al. [3] and Lim et. al. [10] were the first to use variational method for matrix factorization type of problems. Here we describe their method briefly, for more details readers may read the paper of Raiko et. al. We asume the following likelihood function for X, conditional on (U, V ), given by the product of Normal distribution for each of its elements P (X|U, V)= M Y i=1 N Y j=1 N ( xi,j |u i · v T j ,σ 2 x ) (2) where u i and v j denotes the i-th and j -th rows of matrices U and V respectively. And σ 2 x is the noise parameter in observations in X 1 Apache Mahout, Open Source Machine Learning Library, http://mahout.apache.org/