Smart News Feeds for Social Networks Using Scalable Joint Latent Factor Models Himabindu Lakkaraju IBM Research - India klakkara@in.ibm.com Angshu Rai IBM Research - India angshu.rai@in.ibm.com Srujana Merugu IBM Research - India srujanamerugu@in.ibm.com ABSTRACT Social networks such as Facebook and Twitter offer a huge opportunity to tap the collective wisdom (both published and yet to be published) of all the participating users in order to address the information needs of individual users in a highly contextualized fashion using rich user-specific information. Realizing this opportunity, however, requires addressing two key limitations of current social networks: (a) difficulty in discovering relevant content beyond the immediate neighbor- hood, (b) lack of support for information filtering based on semantics, content source and linkage, which can be inter- preted in terms of predicting user-post relevance with better recall and precision respectively. We propose a scalable framework for constructing smart news feeds based on predicting user-post relevance using mul- tiple signals such as text content and attributes of users and posts, and various user-user, post-post and user-post relations (e.g., friend, comment, author relations). Our solution com- prises of two steps where the first step ensures scalability by selecting a small set of user-post dyads with potentially interesting interactions using inverted feature indexes. The second step models the interactions associated with the se- lected dyads via a joint latent factor model, which assumes that the user/post content and relationships can be effectively captured by a common latent representation of the users and posts. Experiments on a Facebook dataset using the proposed model lead to improved precision/recall on relevant posts indi- cating potential for constructing superior quality news feeds. 1. INTRODUCTION The ever-increasing participation of authoritative news sour- ces on social networks coupled with rich multi-media support, and flexible communication protocols have resulted in social networks such as Facebook/Twitter being well-positioned to become the dominant acquisition and dissemination systems for both generic and personal information. Most users, how- ever, still use a combination of sources as news sites, search engines, and Q & A forums even though the relevant infor- mation resides somewhere on Facebook/Twitter and can be delivered more effectively by taking into account user demo- graphics, network linkage and fine-grained historical activity. This is primarily due to two reasons that emerge in user stud- ies [5]: First, it is currently non-trivial to discover all the relevant information or sources in a social network beyond the immediate social graph. Second, the current news feeds in social networks are based on the immediate social graph with little customizability. Expansion of one’s social graph to include all potentially relevant sources would, thus, result in the user’s feed being inundated with a lot of irrelevant content that has to be manually perused. Addressing the above limitations in social networks in an effective way can help build highly useful applications such as (i) smart news feeds comprising of a mix of highly relevant generic and personal information from all over the network, (ii) automatic generation of relevant responses to a query from existing network content, and (iii) question-answering on net- work based on intelligent routing of queries to expert users. The key technical challenge is to design scalable techniques that can combine a large variety of sparse, high dimensional signals, such as text content and attributes of posts and users, and dyadic user-user, post-post, user-post relations (e.g., net- work linkage, authorship, commenting activity) to predict other relationships of interest, e.g., the relevance of a post to a user or to another post. Currently, there exists related work in the area of per- sonalized news recommendation [4] and social network-based search [6] where the relevance of a post to a user is modeled in terms of the structured user-post attributes and user-user (ac- tivity or linkage) correlation . Of these, the techniques based on discriminative models, require substantial feature engineer- ing effort in addition to handling missing observations, while those based on generative models are not very scalable and handle a single dyadic relation. In the current work, we consider the problem of construct- ing a smart news feed by modeling user-post relevance. The novelty of our approach lies in (i) ensuring scalability of the generative model for user-post interactions by conditioning it on a selection variable, which can be computed fast using in- verted feature indexes in a prior step, and (ii) combining the predictive power of multiple dyadic relations and text con- tent using block and topic models coupled using a common latent representation for the users and posts. Section 2 pro- vides a formal problem statement while Section 3 describes the solution approach. 2. PROBLEM STATEMENT Let U denote the set of users, P , the set of posts. For each user u ∈U , let c U u and x u denote the text content and demo- graphic attributes (e.g., gender) in the user profile. Similarly, for each post p ∈P , let c P p and y p denote the text con- tent and structured attributes, (e.g., hasLink). Further, for each dyad of users (u, v) ∈U×U , let r UU u,v denote a vector encoding various relationships between the user dyad (u, v), e.g., friend, follower, etc. Similarly, let r PP p,q and r UP u,p denote encoding of relationships between the dyads (p, q) ∈P×P and (u, p) ∈U×P . Given observations on user and post- specific properties, and (possibly incomplete) user-user, post- post, user-post relationships, the goal is to predict user-post relevance (or in general, some fine-grained user-post interac- tion), in order to obtain all the relevant posts for each user. 3. SOLUTION APPROACH We address the above problem using a two step approach. The first step is motivated by scalability concerns and involves selecting a small set of dyads with potentially interesting in- teractions using inverted feature-based indexes. The second step assumes the first step selection variables to be fully ob- served, and models the dyadic interactions, post and user con- tent using a joint latent factor model. To effectively capture the key post content aspects, we use the labeled-LDA model similar to the one employed in [5]. Each of these components