IJSER © 2017
http://www.ijser.org
Challenges and Security Issues in
Implementation of Hadoop Technology in Current
Digital Era
Dr. Vinay Kumar, Ms. Arpana Chaturvedi
Abstract—With the advent of technologies, managing tremendous amount of over flown and exponentially growing
data is a major area of concern today. This is particularly in terms of storing and organizing data with security. The
exponentially growing data due to Internet of Things (IoT) has led to many challenges for the governmental and non
governmental organizations (NGOs). Security threats forced to the private and public organizations to develop their
own Hadoop based cloud storage architecture .In Apache Hadoop architecture it creates various clusters of machines
and efficiently coordinates the work among them. Hadoop Distributed File System-HDFS and Map Reduce are two
important components of Hadoop. HDFS is the primary storage system used by different applications of Hadoop.It
enables reliable and extremely rapid computations. HDFS provides rich and high availability of data to different user
applications running at the client end. Map Reduce is a software framework for analyzing and transforming a very
large data set into desired output. This paper focuses on the review of HDFS 0, HDFS 2.0 and HDFS 2.8 architecture,
and its various functionalities including analytical and security features.
Index Terms—Cloud Computing, Clusters, Hadoop, HDFS, Hive, IoT, Map Reduce Pig, Sqoop.
—————————— ——————————
1 INTRODUCTION
adoop is an open source architecture which is used to store
the structured, semi structured, unstructured, quasi structured
data ,collectively such data is termed as big data.It provides
meaningful output using data analytics. The standard process
used to work with big data is ETL (Extract, Transform and
Load).Extraction means getting data from multiple sources,
Transform means convert it to fit into analytical needs and
Load means getting it into the right systems to derive mea-
ningful value out of it. It provides various benefits to govern-
mental as well as non governmental organizations. The col-
lected data is of two types, operational data and analytical
data. The different types of data comes under two categories
are: Transactional data, generated from all daily transactions,
Social Data-generated from different social networking sites
like Face book, Google ads etc. Sensor or Machine Data- gen-
erated by industrial equipment, sensors that are installed in
machines, data stored in black box in aviation industry, web
logs which tracks the user behaviors, medical devies, smart
meters, road cameras, satellite, games and many more Internet
of Things .All Government organizations are now-a-days get-
ting digitized and aadhar enabled.Aadhar enabled applica-
tions will provides better services and facilities to the right
person as an individual and let the citizens participate in digi-
tal economy. To implement digitization in different organiza-
tion and to utilize all the benefits now-a-days companies are
moving towards Hadoop technology from existing
one.Hadoop is a highly scalable platform developed in JAVA,
which consists of distributed File system that allows multiple
concurrent jobs to run on multiple servers splitting and trans-
ferring data and files between different nodes. It is efficient to
process or recover the stored data without any delay in case of
failure of any node. At the same time chances of fraudulence
increases while processing or storing information in
HDFS.Due to various big data issues with respect to manage-
ment, storage, processing and security, it is necessary to deal
with all individually [8].
This paper is organized into five sections.Secion 2 deals with
literature review. Hadoop File system, its architecture and
components are discussed in section 3. Existing problem and
the challenges are outlined in Section 4 and paper is finally
concluded with the proposed solution in the section 5.
————————————————
Vinay Kumar is a Professor in Vivekananda Institute of Profes-
sional Studies, Delhi. Earlier he worked as Scientist in NlC, Mo-
CIT Government of India. He completed his Ph.D. in Computer
Science from University of Delhi and MCA from Jawaharlal Ne-
hru University, Delhi.He is member of CSI and ACM. Ph: 011-
2734 3402. E-Mail:vinay5861@gmail.com
Arpana Chaturvedi is working as an Assistant Professor in Ja-
gannath International Management School, Delhi. She is M.Sc.
(Math), MCA and M. Phil. (Comp. Sc). She is pursuing PhD from
Jagannath University. PH-01149219191. E-mail:
ac240871@gmail.com
H
International Journal of Scientific & Engineering Research, Volume 8, Issue 4, April-2017
ISSN 2229-5518 984
IJSER