978-1-5386-9346-9/19/$31.00 ©2019 IEEE Information Gain Model for Efficient Influential Node Identification in Social Networks Kaushik Dutta 1 , Mayank Sharma 2 , Upasna Sharma 2 , Sunil Kumar Khatri 2 , Prashant Johri 3 Amity Institute of Information Technology, Amity University Uttar Pradesh Noida, India c.kaushikdutta@gmail.com 1 , msharma22@amity.edu 2 , usharma1@amity.edu 2 , sunilkkhatri@gmail.com 2 Galgotias University, India 2 johri.prashant@gmail.com 3 Abstract—Influential node detection in social networks has become a vital approach to defining some key players in a network. Many approaches have developed applications of such social network analyses for viral marketing, law enforcement, and collaborative support systems for communities using clustering algorithms or centrality measures. One of the most efficient ways to identify influential nodes in a network is to find centralities of the nodes based on their information gain, which takes into account the information gains of their neighbouring nodes as well. In this paper, we propose a hybrid model of influential node search based on such centralities like the degree centrality, betweenness centrality and information gain of the nodes to provide a more precise measure of influence in any network. Once we obtain priority nodes from different centrality measures including EVC, we apply quantitative efficiency of communication to obtain better influential relationships between the nodes in the dataset. Keywords: Influential Node Detection, Information Gain, Efficiency, Node Centralities, Betweenness Centrality. I. INTRODUCTION A social network by definition relates to a graph of interactions between groups of individuals, where the relationships between the individuals (nodes) may be considered as edges. And because of the existing network there exists a flow of information or data throughout the network. Today, the widespread use of social media and social networks has resulted in an exponential rise in the volume of data worldwide. The use of social network analysis ensures knowledge generation from such networks so as to benefit in numerous areas like data mining, community support systems, business and market analysis, law enforcement, collaborative learning etc. and also address various social needs and issues through multiple approaches such as targeted marketing, user behaviour analysis, predictive analysis etc. Influential node detection has become a very central part of social network analysis as it provides information about some important nodes in the network which affects the flow of information throughout the network. Some of the traditional approaches to influential node detection were based on greedy approaches which run on feature selection algorithms or on some node centrality components like degree centrality or closeness centrality which did not take into consideration about the various basic weights in a social network such as information flow, event ratios of similar and dissimilar ties, transitivity, propinquity, multiplexity etc. In this paper, we look at social networks from the point of multiple centralities that could define influence of nodes as well as how information gain of different nodes could provide an insight on how important the nodes are for the network. We proceed with this paper to define the influence of nodes in a network based on information gain of the nodes and try to add more centrality measures to improve the accuracy of the standards of the influence. Then we try to further filter the results based on the proposed model which discusses how the actions performed by the nodes or the type of information flowing would affect the influence in a social network. Considering that social networks are highly influenced by properties like segmentation, clustering and density of ties etc., it is imperative to accurately predict information diffusion throughout the network which leads us to include only such centralities that would define the probability of information gain. [1] Influence propagation can be depicted as follows, In Fig.1, the alphabets represent users in a network and the arrows represent the relations based on attention. From the above graph AB translates to the fact that A is dependent on B for information. Similarly, B, C and G are dependent on D for information. D is dependent on E which is again dependent on H. So ultimately, the flow of information is