International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET ISO 9001:2008 Certified Journal Page 939
Data Security in Hadoop Distributed File System
Sharifnawaj Y. Inamdar
1
, Ajit H. Jadhav
2
, Rohit B. Desai
3
,
Pravin S. Shinde
4
, Indrajeet M. Ghadage
5
, Amit A. Gaikwad
6
.
1
Professor,Department of Computer Science & Engineering, DACOE Karad, Maharashtra, India
2,3,4,5,6
Student,Final Year B.E.-Computer Science & Engineering, DACOE Karad, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Hadoop is most popularly used
distributed programming framework for processing large
amount of data with Hadoop distributed file system (HDFS)
but processing personal or sensitive data on distributed
environment demands secure computing. Originally Hadoop
was designed without any security model.
In this project, security of HDFS is implemented
using encryption of file which is to be stored at HDFS. For
encryption a real-time encryption algorithm is used. So a
user who has the key for decryption can perform decryption
of data & access that data for data mining. User
authentication is also done for the system. We have also
compared this method with the method previously
implemented i.e. encryption & decryption using AES.
Encrypting using AES results into growing of file size to
double of original file & hence file upload time also increases.
The technique used in this project removes this drawback.
We have implemented method in which OAuth does
the authentication and provide unique authorization token
for each user which is used in encryption technique that
provide data privacy for all users of Hadoop. The Real Time
encryption algorithms used for securing data in HDFS uses
the key that is generated by using authorization token.
Key Words: Hadoop,Big data,Security,HDFS,OAuth.
1.INTRODUCTION
Hadoop was developed from GFS (Google File
System) [2, 3] and Map Reduce papers published by Google
in 2003 and 2004 respectively. Hadoop is a framework of
tools, implemented in Java. It supports running applications
on big data.
1.1 Project Idea:
Hadoop is designed without considering security of
data. Data stored at HDFS is in plaintext. This data is prone to
be accessed by unauthorized user. So method for securing
this data is needed. Hence we are developing this highly
secure system for Hadoop Distributed File System.
1.2 Need of project:
Hadoop is generally executing in big clusters or
might be in an open cloud administration. Amazon, Yahoo,
Google, and so on are such open cloud where numerous
clients can run their jobs utilizing Elastic MapReduce and
distributed storage provided by Hadoop. It is key to execute
the security of client information in such systems.
Web produces expansive measure of information
consistently. It incorporate the organized information rate
on web is around 32% and unstructured information is 63%.
Additionally the volume of advanced substance on web
grows up to more than 2.7ZB in 2012 which is 48% more
from 2011 and now soaring towards more than 8ZB by 2015.
Each industry and business associations are has a critical
information about various item, generation and its business
sector review which is a major information advantageous for
efficiency development.
Fig-1:System Architecture
The files in Hadoop distributed file system (HDFS)
are divided into multiple blocks and replicated to other
DataNodes(by default 2 nodes) to ensure high data
availability and durability in case of failure of execution of
job(parallel application in Hadoop environment). Originally
Hadoop clusters have two types of node operating as master-