International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 8958 (Online), Volume-9 Issue-5, June 2020 206 Published By: Blue Eyes Intelligence Engineering & Sciences Publication © Copyright: All rights reserved. Retrieval Number: E9364069520/2020©BEIESP DOI: 10.35940/ijeat.E9364.069520 Journal Website: www.ijeat.org Abstract: As with prior technological advancements, big data technology is growing at present and we have to identify what are the possible threats to overhead the present security systems. Due to the development of recent technical environment like cloud, network connected smartphones and the omnipresent digital conversion of huge volume of all types of data poses more possible threats to sensitive data. Due to the improved vulnerability big data requires increased responsibility. During the last two years, the amount of data that has been created is about 90% of the whole data created. Strengthening the security of sensitive data from unauthorized discovery is the most challenging process in all kind of data processing. Data Leakage Detection offers a set of methods and techniques that can professionally solve the problem arising in particular critical data. The large amounts of existing data is mostly unstructured. To retrieve meaningful information, we have to develop superior analytical method in big data. At present we have more algorithms for security which are not easy to be implement for huge volume of data. We have to protect the sensitive information as well as details related users with the help of security protocols in big data. The sensitive data of the patient, different types of code patterns and set of attributes to be secured by using machine learning tool. Machine learning tools have a lot of library functions to protect the sensitive information about the clients. We recommend the Secure Pattern-Based Data Sensitivity Framework (PBDSF), to protect such sensitive information from big data using Machine Learning. In the proposed framework, HDFS is implemented to analysis the big data, to classify most important information and converting the sensitive data in a secure manner. Keywords HDFS, EMR, Security, Big Data, Content Based Access Control, Sensitive Data Detection, Attribute-Based Access Control I. INTRODUCTION To create an Enhanced security framework for protecting confidential medical data from unauthorized users on big data, Machine learning approach is proposed in this study. The present security solutions are not able to provide pool proof security in big data. Due to complexity in time consumption, our present security mechanisms are not sufficient for providing the security against unauthorized users. In this approach, big data file is to be classified based Revised Manuscript Received on May 15, 2020. * Correspondence Author K Rajeshkumar*, Assistant Professor, Department of Computer Science and Engineering, Theni Kammavar Sangam College of Technology, Theni, India, kumar85rajesh@gmail.com Dr.S. Dhanasekaran, Associate Professor, Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankovil, srividhans@gmail.com Dr. V. Vasudevan, Senior Professor, Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankovil, vasudevan_klu@yahoo.co.in upon the risk effect level into public and confidential. Due to the technological advancement and usage of the more internet based services, many users need to access the big data servers or services frequently. The user’s confidential data like healthcare information, trading secrets and personal data can be kept in secured manner because of the technology advancement. The HDFS is the central information storage system that can store the large data through Hadoop application. With the help of Name Node and Data Node architecture, HDFS provides high performance access to highly scalable Hadoop collections. HDFS can manage the groups of large amount of data and relevant big data analytical applications. Our approach is to create a security protocol, techniques, tools, and security policy management framework to avoid unauthorized access in big data. HDFS is a competence tool [2], [3] which maintains handles, stores large amount of data, provides quick programmed conclusions, and reduces the humanoid estimations. With the capacities of dependability, accountability, idleness, and distributed architecture HDFS is wildly recognized as commonly used dataset tool [4]. Due to this support, HDFS was designed to deal various big data types; structured, semi-structured and unstructured. In [5], Map Reduce Job-Scheduling algorithm guides grouping big data in an extented networking condition. We are to be able to secure the fixed data against vulnerabilities with normal security tools. The solution of the big data security would happen data accessibility, reliability, and privacy. More encryption techniques are applied to protect the data from unauthorized access [6], [7]. The data supervision and classifications of data is the basic concern in big data. Kerberos management policies provides secure data at communications, transmission, authorization, and storage [8]. It is developed for transport layer secure communication, data encryption, and data authentication. The solutions of Kerberos is not easy to implement. Exploration of Big Data Security Framework using Machine Learning K. Rajeshkumar, S. Dhanasekaran, V. Vasudevan