Forensic Analysis of Heterogeneous Social Media Data Aikaterini Nikolaidou, Michalis Lazaridis, Theodoros Semertzidis, Apostolos Axenopoulos and Petros Daras Information Technologies Institute, CERTH, Thessaloniki, Greece Keywords: Social Media Analytics Forensic Platform, Heterogeneous Social Media Data, Ontology, Labeled Property Graph. Abstract: It is a challenge to aggregate and analyze data from heterogeneous social media sources not only for busi- nesses and organizations but also for Law Enforcement Agencies. The latter’s core objectives are to monitor criminal and terrorist related activities and to identify the ”key players” in various networks. In this paper, a framework for homogenizing and exploiting data from multiple sources is presented. Moreover, as part of the framework, an ontology that reflects today’s social media perceptions is introduced. Data from multiple sources is transformed into a labeled property graph and stored in a graph database in a homogenized way based on the proposed ontology. The result is a cross-source analysis system where end-users can explore different scenarios and draw conclusions through a library of predefined query placeholders that focus on forensic investigation. The framework is evaluated on the Stormfront dataset, a radical right, web community. Finally, the benefits of applying the proposed framework to discover and visualize the relationships between the Stormfront profiles are presented. 1 INTRODUCTION Social media sites constitute a rich pool of evolving content along with personal data, preferences, activi- ties and relationships. Due to their affordability and accessibility, social media is a means of communi- cation and action for criminal and terrorist organiza- tions. At earlier times, most crimes left breadcrumbs of evidence in the real world. Nowadays, through the interactive social media platforms, offenders en- gage in illicit practices such as fraud, cyber stalking, cyber bullying etc. (Gambhir, 2018). Terrorists ex- ploit social media to reach audiences for potential re- cruits, disseminate messages and organize strategic operations (Bertram, 2016). Furthermore, social me- dia play a key role in political socialization in terms of influencing individual behavior and preparedness to participate in collective actions (Passy, 2000). Law Enforcement Agencies (LEAs) wish to take advantage of these information sources for the sake of security. For monitoring and analyzing criminal- related activity in social media networks, one major question is to identify the most influential profiles, known also as ”key player” discovery (Zenou, 2016). LEAs are also interested in answering the so called six W’s: Who, What, When, Where, Why and How. These questions are fundamental and are traditionally raised during criminal investigations (Carrier et al., 2003). Social media evidence provides information about a suspect’s or a victim’s profile that can be mined in close-to-real-time. The contacts, messages, geo- location data, photos and generally their daily activi- ties are offered in a chronological order. Monitoring and analyzing the abundance of information shared by social media users and social media metadata should theoretically facilitate LEAs to gain insights into mass communication and come to fruitful conclusions in an inquiry. However, this kind of exploration is by no means a straightforward task. A recent research by Arshad et al. (Arshad et al., 2019) explains the challenges that law enforcement personnel face when handling social media data for forensic investigation. The first issue the authors ob- serve is that in a single social media investigation, some data elements are considered out of context and not taken into account. Moreover, the components of data which they consider important are stored sepa- rately. All fragmented and unstructured social me- dia information, although it may seem to be of little importance, would be very useful if it was in a co- herent representation and in chronological order. Be- Nikolaidou, A., Lazaridis, M., Semertzidis, T., Axenopoulos, A. and Daras, P. Forensic Analysis of Heterogeneous Social Media Data. DOI: 10.5220/0008347803430350 In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019), pages 343-350 ISBN: 978-989-758-382-7 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 343