Securing Big Data Environments from Attacks Udaya Tupakula Vijay Varadharajan Advanced Cyber Security Research Centre Faculty of Science and Engineering Macquarie University, Sydney, Australia {udaya.tupakula; vijay.varadharajan}@mq.edu.au Abstract—In this paper we propose techniques for securing big data environments such as public cloud with tenants using their virtual machines for different services such as utility and healthcare. Our model makes use of state based monitoring of the data sources for service specific detection of the attacks and offline traffic analysis of multiple data sources to detect attacks such as botnets. Keywords—Big Data Security, Security Attacks I. INTRODUCTION Emerging technologies such as smart grids, Internet of Things (IoT) and clouds generate huge amount of data. Several business models have been developed and innovative applications have proposed for making use of this data for improving the quality of life and providing better services to the customers. For example, business models have been developed for capturing the location and behaviour of the users from their mobile devices and using this information for targeted advertisement and smart transportation. Utility providers are capturing power usage of the smart devices in real time to estimate the peak time demand for the generation of power and also offer variable pricing depending on the time of use. Although there are several advantages with such emerging technologies, there are significant challenges for securing such environments. As shown in Figure 1, a simple big data scenario [1] consists of capturing structured or unstructured data from several Heterogeneous Data Sources (HDS) such as tiny sensors, servers, laptops, desktops, virtual machines, and smart phones, storing of the data in easily accessible location (centralised or distributed) and analysis or further processing of data for different applications. However, since data is captured from untrusted devices, attackers or compromised devices can easily upload malicious data to the storage controller and the attacks can be spread to all other devices that access this malicious data. Also the volume, velocity and variety of the data generated in such environments makes it extremely challenging to deal with the attacks in such environment. Hence there is need for techniques for securing such big data environments. In this paper we propose techniques for securing big data environments. The paper is organised as follows. Section II presents the attacker model and overview of the operation of our model. Section III presents detail discussion on the components of our model. Section IV present the implementation of our model and how it helps to deal with different attacks. Section V concludes. II. OUR APPROACH Our model makes use of Trusted Components (TC) for enforcing service specific security policies on the HDS and also for capturing the data required for security analysis. The TC are placed at different devices in secure locations and the policies enforced in the TC depends on the capabilities of the devices. For example, the TC can be placed in the gateways, access points, base stations and virtual machine monitors. Let us consider a simple cloud [2] scenario and discuss an attacker model and operation of our model. For example, in the case of cloud there can be several millions of devices (volume) that are uploading/downloading data in frequent intervals (velocity) and different types of data (variety) such as tenants using the cloud for different services such as critical infrastructure, health care and utility providers. A. Attacker Model Let us consider a generic big data scenario such as public cloud with different tenants (utility, healthcare, finance, governments) making use of IaaS public cloud for hosting their services. The tenants can be running different operating systems (such as Windows, Linux) and service specific applications in their virtual machines. Attacks in such environments can lead to catastrophic damages (blackouts in case of attacks on utility services) and in some cases loss of life (eg. doctors unable to access patient’s data). There are several challenges to deal with the attacks in such environments. On one hand there are attacks that target specific services (such as Stuxnet [3] for SCADA) of the tenants and on other hand there are some attacks such as botnet that are common for any of the tenants since their applications are running on popular OS such as Windows and Linux. If the attacker can exploit vulnerabilities in such OS then it can compromise different tenant services. Attacks such as botnet are practical in the current state of art. The botnet is a group of Fig. 1. Big Data Scenario Data Storage Controller Data Analysis Engine HDS X HDS 2 HDS 1 2016 IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, IEEE International Conference on Intelligent Data and Security 978-1-5090-2403-2/16 $31.00 © 2016 IEEE DOI 10.1109/BigDataSecurity-HPSC-IDS.2016.74 109