978-1-5386-9346-9/19/$31.00 ©2019 IEEE
Information Gain Model for Efficient Influential
Node Identification in Social Networks
Kaushik Dutta
1
, Mayank Sharma
2
, Upasna Sharma
2
, Sunil Kumar Khatri
2
, Prashant Johri
3
Amity Institute of Information Technology, Amity University Uttar Pradesh Noida, India
c.kaushikdutta@gmail.com
1
, msharma22@amity.edu
2
, usharma1@amity.edu
2
, sunilkkhatri@gmail.com
2
Galgotias University, India
2
johri.prashant@gmail.com
3
Abstract—Influential node detection in social networks has
become a vital approach to defining some key players in a
network. Many approaches have developed applications of
such social network analyses for viral marketing, law
enforcement, and collaborative support systems for
communities using clustering algorithms or centrality
measures. One of the most efficient ways to identify
influential nodes in a network is to find centralities of the
nodes based on their information gain, which takes into
account the information gains of their neighbouring nodes
as well. In this paper, we propose a hybrid model of
influential node search based on such centralities like the
degree centrality, betweenness centrality and information
gain of the nodes to provide a more precise measure of
influence in any network. Once we obtain priority nodes
from different centrality measures including EVC, we
apply quantitative efficiency of communication to obtain
better influential relationships between the nodes in the
dataset.
Keywords: Influential Node Detection, Information Gain,
Efficiency, Node Centralities, Betweenness Centrality.
I. INTRODUCTION
A social network by definition relates to a graph of
interactions between groups of individuals, where the
relationships between the individuals (nodes) may be
considered as edges. And because of the existing network
there exists a flow of information or data throughout the
network. Today, the widespread use of social media and social
networks has resulted in an exponential rise in the volume of
data worldwide. The use of social network analysis ensures
knowledge generation from such networks so as to benefit in
numerous areas like data mining, community support systems,
business and market analysis, law enforcement, collaborative
learning etc. and also address various social needs and issues
through multiple approaches such as targeted marketing, user
behaviour analysis, predictive analysis etc.
Influential node detection has become a very central part of
social network analysis as it provides information about some
important nodes in the network which affects the flow of
information throughout the network. Some of the traditional
approaches to influential node detection were based on greedy
approaches which run on feature selection algorithms or on
some node centrality components like degree centrality or
closeness centrality which did not take into consideration
about the various basic weights in a social network such as
information flow, event ratios of similar and dissimilar ties,
transitivity, propinquity, multiplexity etc.
In this paper, we look at social networks from the point of
multiple centralities that could define influence of nodes as
well as how information gain of different nodes could provide
an insight on how important the nodes are for the network. We
proceed with this paper to define the influence of nodes in a
network based on information gain of the nodes and try to add
more centrality measures to improve the accuracy of the
standards of the influence. Then we try to further filter the
results based on the proposed model which discusses how the
actions performed by the nodes or the type of information
flowing would affect the influence in a social network.
Considering that social networks are highly influenced by
properties like segmentation, clustering and density of ties
etc., it is imperative to accurately predict information
diffusion throughout the network which leads us to include
only such centralities that would define the probability of
information gain. [1] Influence propagation can be depicted as
follows,
In Fig.1, the alphabets represent users in a network and the
arrows represent the relations based on attention. From the
above graph AB translates to the fact that A is dependent on
B for information. Similarly, B, C and G are dependent on D
for information. D is dependent on E which is again
dependent on H. So ultimately, the flow of information is