Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks ∗ Microsoft Research Technical Report MSR-TR-2010-2 January 2010 Wei Chen Microsoft Research Asia Beijing, China weic@microsoft.com Chi Wang Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 USA chiwang1@illinois.edu Yajun Wang Microsoft Research Asia Beijing, China yajunw@microsoft.com Abstract Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influ- ence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling preva- lent viral marketing in large-scale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good perfor- mance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tun- able parameter for users to control the balance between the run- ning time and the influence spread of the algorithm. Our results from extensive simulations on several real-world and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond million-sized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread — it is always among the best algorithms, and in most cases it sig- nificantly outperforms all other scalable heuristics to as much as 100%–260% increase in influence spread. Keywords: influence maximization, social networks, viral marketing * This is the second revision of the paper, done in Feb. 2010. The main change in this revision is to focus on the scalability of our new algorithm. We conduct new tests with real-world data up to millions of nodes and edges to show the strong scalability of our algorithm. Presentations are changed in various places to reflect this focus and to improve the overall readability. 1 Introduction Word-of-mouth or viral marketing differentiates itself from other marketing strategies because it is based on trust among individuals’ close social circle of families, friends, and co- workers. Research shows that people trust the information ob- tained from their close social circle far more than the informa- tion obtained from general advertisement channels such as TV, newspaper and online advertisements [15]. Thus many peo- ple believe that word-of-mouth marketing is the most effective marketing strategy (e.g. [14]). The increasing popularity of many online social network sites, such as Facebook, Myspace, and Twitter, presents new opportunities for enabling large-scale and prevalent viral mar- keting online. Consider the following hypothetical scenario as a motivating example. A small company develops a cool on- line application and wants to market it through an online social network. It has a limited budget such that it can only select a small number of initial users in the network to use it (by giving them gifts or payments). The company wishes that these ini- tial users would love the application and start influencing their friends on the social network to use it, and their friends would influence their friends’ friends and so on, and thus through the word-of-mouth effect a large population in the social network would adopt the application. The problem is whom to select as the initial users so that they eventually influence the largest number of people in the network. The above problem, called influence maximization, is first formulated as a discrete optimization problem by Kempe, Kleinberg, and Tardos as follows [9]: A social network is mod- eled as a graph with nodes representing individuals and edges representing connections or relationship between two individ-