International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958 (Online), Volume-9 Issue-5, June 2020
206
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
© Copyright: All rights reserved.
Retrieval Number: E9364069520/2020©BEIESP
DOI: 10.35940/ijeat.E9364.069520
Journal Website: www.ijeat.org
Abstract: As with prior technological advancements, big data
technology is growing at present and we have to identify what are
the possible threats to overhead the present security systems. Due
to the development of recent technical environment like cloud,
network connected smartphones and the omnipresent digital
conversion of huge volume of all types of data poses more possible
threats to sensitive data. Due to the improved vulnerability big
data requires increased responsibility. During the last two years,
the amount of data that has been created is about 90% of the
whole data created. Strengthening the security of sensitive data
from unauthorized discovery is the most challenging process in all
kind of data processing. Data Leakage Detection offers a set of
methods and techniques that can professionally solve the problem
arising in particular critical data. The large amounts of existing
data is mostly unstructured. To retrieve meaningful information,
we have to develop superior analytical method in big data. At
present we have more algorithms for security which are not easy to
be implement for huge volume of data. We have to protect the
sensitive information as well as details related users with the help
of security protocols in big data. The sensitive data of the patient,
different types of code patterns and set of attributes to be secured
by using machine learning tool. Machine learning tools have a lot
of library functions to protect the sensitive information about the
clients. We recommend the Secure Pattern-Based Data Sensitivity
Framework (PBDSF), to protect such sensitive information from
big data using Machine Learning. In the proposed framework,
HDFS is implemented to analysis the big data, to classify most
important information and converting the sensitive data in a
secure manner.
Keywords – HDFS, EMR, Security, Big Data, Content
Based Access Control, Sensitive Data Detection, Attribute-Based
Access Control
I. INTRODUCTION
To create an Enhanced security framework for protecting
confidential medical data from unauthorized users on big
data, Machine learning approach is proposed in this study.
The present security solutions are not able to provide pool
proof security in big data. Due to complexity in time
consumption, our present security mechanisms are not
sufficient for providing the security against unauthorized
users. In this approach, big data file is to be classified based
Revised Manuscript Received on May 15, 2020.
* Correspondence Author
K Rajeshkumar*, Assistant Professor, Department of Computer Science
and Engineering, Theni Kammavar Sangam College of Technology, Theni,
India, kumar85rajesh@gmail.com
Dr.S. Dhanasekaran, Associate Professor, Department of Computer
Science and Engineering, Kalasalingam Academy of Research and
Education, Krishnankovil, srividhans@gmail.com
Dr. V. Vasudevan, Senior Professor, Department of Computer Science
and Engineering, Kalasalingam Academy of Research and Education,
Krishnankovil, vasudevan_klu@yahoo.co.in
upon the risk effect level into public and confidential. Due to
the technological advancement and usage of the more internet
based services, many users need to access the big data servers
or services frequently. The user’s confidential data like
healthcare information, trading secrets and personal data can
be kept in secured manner because of the technology
advancement. The HDFS is the central information storage
system that can store the large data
through Hadoop application. With the help of Name Node
and Data Node architecture, HDFS provides high
performance access to highly scalable Hadoop collections.
HDFS can manage the groups of large amount of data and
relevant big data analytical applications. Our approach is to
create a security protocol, techniques, tools, and security
policy management framework to avoid unauthorized access
in big data.
HDFS is a competence tool [2], [3] which maintains
handles, stores large amount of data, provides quick
programmed conclusions, and reduces the humanoid
estimations. With the capacities of dependability,
accountability, idleness, and distributed architecture HDFS is
wildly recognized as commonly used dataset tool [4]. Due to
this support, HDFS was designed to deal various big data
types; structured, semi-structured and unstructured. In [5],
Map Reduce Job-Scheduling algorithm guides grouping big
data in an extented networking condition. We are to be able to
secure the fixed data against vulnerabilities with normal
security tools. The solution of the big data security would
happen data accessibility, reliability, and privacy.
More encryption techniques are applied to protect the data
from unauthorized access [6], [7]. The data supervision and
classifications of data is the basic concern in big data.
Kerberos management policies provides secure data at
communications, transmission, authorization, and storage [8].
It is developed for transport layer secure communication,
data encryption, and data authentication. The solutions of
Kerberos is not easy to implement.
Exploration of Big Data Security Framework
using Machine Learning
K. Rajeshkumar, S. Dhanasekaran, V. Vasudevan