TRIAGE : Temporal Twiter atribute graph paterns
Ilias Dimitriadis
idimitriad@csd.auth.gr
Aristotle University of Thessaloniki
Thessaloniki, Greece
Marinos Poiitis
mpoiitis@csd.auth.gr
Aristotle University of Thessaloniki
Thessaloniki, Greece
Christos Faloutsos
christos@cs.cmu.edu
Carnegie Mellon University
Pittsburgh, Pennsylvania, USA
Athena Vakali
avakali@csd.auth.gr
Aristotle University of Thessaloniki
Thessaloniki, Greece
ABSTRACT
Given a node-attributed network of Twitter users, can we capture
their posting behavior over time and identify patterns that could
probably describe, model or predict their activity? Based on the
assumption that the posts of these users are topic-specifc, can we
identify temporal connectivity patterns that emerge from the use
of specifc attributes? More challengingly, are there any particular
attribute usage patterns which indicate an inherent anomaly either
for users or attributes? Our study attempts to provide solid answers
to all the above questions, extending previous work on other social
networks and attribute types. We propose TRIAGE, a pipeline of
methods which : (a) identify temporal behavioral patterns in indi-
vidual attribute distributions, (b) model the temporal evolution of
attribute induced graphs and (c) detect irregular attributes and users
based on the patterns identifed earlier; More specifcally, we model
the attribute distributions using the log-Odds ratio, we provide
explanations with respect to the attribute induced subgraph pat-
terns and we observe the structural diferences of attribute induced
subgraphs based on these patterns. Experimental results show that
: most of the individual attribute distributions remain stable over
time following mostly power laws norm; the temporal evolution
of attribute induced graphs obey certain laws and deviations are
outliers; fnally, we discover that we can indeed identify the struc-
ture of each subgraph, based on the emerging patterns. Real dataset
experiments on 50K Twitter users activities and attributes has suc-
cessfully proven that TRIAGE has efectively identifed Twitter
user and attribute behavioral patterns and can identify irregular
activities for users and anomalous graph structures for attribute
induced subgraphs.
CCS CONCEPTS
· Networks → Online social networks; · Computing method-
ologies → Anomaly detection; · Information systems → Data
mining.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from Permissions@acm.org.
WIMS 2020, June 30-July 3, 2020, Biarritz, France
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7542-9/20/06. . . $15.00
https://doi.org/10.1145/3405962.3405998
KEYWORDS
Graph mining, Social networks, Anomaly Detection, Network Mod-
elling, Twitter
ACM Reference Format:
Ilias Dimitriadis, Marinos Poiitis, Christos Faloutsos, and Athena Vakali.
2020. TRIAGE : Temporal Twitter attribute graph patterns. In The 10th
International Conference on Web Intelligence, Mining and Semantics (WIMS
2020), June 30-July 3, 2020, Biarritz, France. ACM, New York, NY, USA,
10 pages. https://doi.org/10.1145/3405962.3405998
1 INTRODUCTION
Given a large set of tweets and the users activities around them,
how would you fnd a user’s "surprising" activity? Are there any
attributes which characterize varying behavioral norms? Looking
the problem from another aspect, can an attribute induced graph
analysis detect such anomalies or provide information about the
structure of the graph?
Twitter remains a most widely used social network, with more
than 275 million active users
1
.
Due to its high popularity, Twitter has been used to exert politics,
raise awareness and infuence public opinion in multiple ways
and for multiple topics. Its capacity to immediately impact users
has up to now been exploited both for good [31] and bad reasons
[19]. One of the challenging problems in Twitter use, is how to
fnd whether posts are made by humans or by an automated (bot)
software with a specifc, potentially malicious, purpose. Such bots
accounts or spammers [7] tend to mimic the real users behavior,
with the intention to deceive others and to increase their popularity.
Once bots penetrate to a users network, they can impact trends
formation, spread fake news and even disseminate scams, phishing,
and malware [37]. The identifcation of such actions is crucial for
multiple domains (societal; business brands; infotainment; politics;
academia; etc).
Twitter users activities involve multiple actions : (a) post content
(tweets) along with metadata (hashtags, images, videos or URLs);
(b) react to other users’ tweets by certain declarations (retweets,
favorites, quotes); and (c) interact with other users by mentioning
them, replying to their posts and making reciprocal (or not) friend-
ships (followers, friends). Thus, users’ activity is characterized by
multiple attributes (such as hashtags, URLs, mentions, etc) which
represent their actions traits. Attributes have up to now been used
individually as features in building classifcation models[38] or as
1
https://www.statista.com/statistics/303681/twitter-users-worldwide/