TRIAGE : Temporal Twiter atribute graph paterns Ilias Dimitriadis idimitriad@csd.auth.gr Aristotle University of Thessaloniki Thessaloniki, Greece Marinos Poiitis mpoiitis@csd.auth.gr Aristotle University of Thessaloniki Thessaloniki, Greece Christos Faloutsos christos@cs.cmu.edu Carnegie Mellon University Pittsburgh, Pennsylvania, USA Athena Vakali avakali@csd.auth.gr Aristotle University of Thessaloniki Thessaloniki, Greece ABSTRACT Given a node-attributed network of Twitter users, can we capture their posting behavior over time and identify patterns that could probably describe, model or predict their activity? Based on the assumption that the posts of these users are topic-specifc, can we identify temporal connectivity patterns that emerge from the use of specifc attributes? More challengingly, are there any particular attribute usage patterns which indicate an inherent anomaly either for users or attributes? Our study attempts to provide solid answers to all the above questions, extending previous work on other social networks and attribute types. We propose TRIAGE, a pipeline of methods which : (a) identify temporal behavioral patterns in indi- vidual attribute distributions, (b) model the temporal evolution of attribute induced graphs and (c) detect irregular attributes and users based on the patterns identifed earlier; More specifcally, we model the attribute distributions using the log-Odds ratio, we provide explanations with respect to the attribute induced subgraph pat- terns and we observe the structural diferences of attribute induced subgraphs based on these patterns. Experimental results show that : most of the individual attribute distributions remain stable over time following mostly power laws norm; the temporal evolution of attribute induced graphs obey certain laws and deviations are outliers; fnally, we discover that we can indeed identify the struc- ture of each subgraph, based on the emerging patterns. Real dataset experiments on 50K Twitter users activities and attributes has suc- cessfully proven that TRIAGE has efectively identifed Twitter user and attribute behavioral patterns and can identify irregular activities for users and anomalous graph structures for attribute induced subgraphs. CCS CONCEPTS · Networks → Online social networks; · Computing method- ologies → Anomaly detection; · Information systems → Data mining. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from Permissions@acm.org. WIMS 2020, June 30-July 3, 2020, Biarritz, France © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7542-9/20/06. . . $15.00 https://doi.org/10.1145/3405962.3405998 KEYWORDS Graph mining, Social networks, Anomaly Detection, Network Mod- elling, Twitter ACM Reference Format: Ilias Dimitriadis, Marinos Poiitis, Christos Faloutsos, and Athena Vakali. 2020. TRIAGE : Temporal Twitter attribute graph patterns. In The 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020), June 30-July 3, 2020, Biarritz, France. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3405962.3405998 1 INTRODUCTION Given a large set of tweets and the users activities around them, how would you fnd a user’s "surprising" activity? Are there any attributes which characterize varying behavioral norms? Looking the problem from another aspect, can an attribute induced graph analysis detect such anomalies or provide information about the structure of the graph? Twitter remains a most widely used social network, with more than 275 million active users 1 . Due to its high popularity, Twitter has been used to exert politics, raise awareness and infuence public opinion in multiple ways and for multiple topics. Its capacity to immediately impact users has up to now been exploited both for good [31] and bad reasons [19]. One of the challenging problems in Twitter use, is how to fnd whether posts are made by humans or by an automated (bot) software with a specifc, potentially malicious, purpose. Such bots accounts or spammers [7] tend to mimic the real users behavior, with the intention to deceive others and to increase their popularity. Once bots penetrate to a users network, they can impact trends formation, spread fake news and even disseminate scams, phishing, and malware [37]. The identifcation of such actions is crucial for multiple domains (societal; business brands; infotainment; politics; academia; etc). Twitter users activities involve multiple actions : (a) post content (tweets) along with metadata (hashtags, images, videos or URLs); (b) react to other users’ tweets by certain declarations (retweets, favorites, quotes); and (c) interact with other users by mentioning them, replying to their posts and making reciprocal (or not) friend- ships (followers, friends). Thus, users’ activity is characterized by multiple attributes (such as hashtags, URLs, mentions, etc) which represent their actions traits. Attributes have up to now been used individually as features in building classifcation models[38] or as 1 https://www.statista.com/statistics/303681/twitter-users-worldwide/