Securing Big Data Environments from Attacks
Udaya Tupakula Vijay Varadharajan
Advanced Cyber Security Research Centre
Faculty of Science and Engineering
Macquarie University, Sydney, Australia
{udaya.tupakula; vijay.varadharajan}@mq.edu.au
Abstract—In this paper we propose techniques for securing
big data environments such as public cloud with tenants using
their virtual machines for different services such as utility and
healthcare. Our model makes use of state based monitoring of
the data sources for service specific detection of the attacks and
offline traffic analysis of multiple data sources to detect attacks
such as botnets.
Keywords—Big Data Security, Security Attacks
I. INTRODUCTION
Emerging technologies such as smart grids, Internet of
Things (IoT) and clouds generate huge amount of data. Several
business models have been developed and innovative
applications have proposed for making use of this data for
improving the quality of life and providing better services to
the customers. For example, business models have been
developed for capturing the location and behaviour of the users
from their mobile devices and using this information for
targeted advertisement and smart transportation. Utility
providers are capturing power usage of the smart devices in
real time to estimate the peak time demand for the generation
of power and also offer variable pricing depending on the time
of use. Although there are several advantages with such
emerging technologies, there are significant challenges for
securing such environments.
As shown in Figure 1, a simple big data scenario [1]
consists of capturing structured or unstructured data from
several Heterogeneous Data Sources (HDS) such as tiny
sensors, servers, laptops, desktops, virtual machines, and smart
phones, storing of the data in easily accessible location
(centralised or distributed) and analysis or further processing of
data for different applications. However, since data is captured
from untrusted devices, attackers or compromised devices can
easily upload malicious data to the storage controller and the
attacks can be spread to all other devices that access this
malicious data. Also the volume, velocity and variety of the
data generated in such environments makes it extremely
challenging to deal with the attacks in such environment.
Hence there is need for techniques for securing such big data
environments.
In this paper we propose techniques for securing big data
environments. The paper is organised as follows. Section II
presents the attacker model and overview of the operation of
our model. Section III presents detail discussion on the
components of our model. Section IV present the
implementation of our model and how it helps to deal with
different attacks. Section V concludes.
II. OUR APPROACH
Our model makes use of Trusted Components (TC) for
enforcing service specific security policies on the HDS and
also for capturing the data required for security analysis. The
TC are placed at different devices in secure locations and the
policies enforced in the TC depends on the capabilities of the
devices. For example, the TC can be placed in the gateways,
access points, base stations and virtual machine monitors.
Let us consider a simple cloud [2] scenario and discuss an
attacker model and operation of our model. For example, in the
case of cloud there can be several millions of devices (volume)
that are uploading/downloading data in frequent intervals
(velocity) and different types of data (variety) such as tenants
using the cloud for different services such as critical
infrastructure, health care and utility providers.
A. Attacker Model
Let us consider a generic big data scenario such as public
cloud with different tenants (utility, healthcare, finance,
governments) making use of IaaS public cloud for hosting their
services. The tenants can be running different operating
systems (such as Windows, Linux) and service specific
applications in their virtual machines.
Attacks in such environments can lead to catastrophic
damages (blackouts in case of attacks on utility services) and in
some cases loss of life (eg. doctors unable to access patient’s
data). There are several challenges to deal with the attacks in
such environments. On one hand there are attacks that target
specific services (such as Stuxnet [3] for SCADA) of the
tenants and on other hand there are some attacks such as botnet
that are common for any of the tenants since their applications
are running on popular OS such as Windows and Linux. If the
attacker can exploit vulnerabilities in such OS then it can
compromise different tenant services. Attacks such as botnet
are practical in the current state of art. The botnet is a group of
Fig. 1. Big Data Scenario
Data Storage Controller
Data Analysis Engine
HDS X HDS 2 HDS 1
2016 IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance
and Smart Computing, IEEE International Conference on Intelligent Data and Security
978-1-5090-2403-2/16 $31.00 © 2016 IEEE
DOI 10.1109/BigDataSecurity-HPSC-IDS.2016.74
109