International Journal of Electrical and Computer Engineering (IJECE) Vol. 11, No. 5, October 2021, pp. 4439~4448 ISSN: 2088-8708, DOI: 10.11591/ijece.v11i5.pp4439-4448 4439 Journal homepage: http://ijece.iaescore.com Security aware information classification in health care big data Snehalata K. Funde, Gandharba Swain Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, India Article Info ABSTRACT Article history: Received Jun 15, 2020 Revised Apr 8, 2021 Accepted Apr 27, 2021 These days e-medical services frameworks are getting famous for taking care of patients from far-off spots, so a lot of medical services information like the patient’s name, area, contact number, states of being are gathered distantly to treat the patients. A lot of information gathered from the different assets is named big data. The enormous sensitive information about the patient contains delicate data like systolic BP, pulse, temperature, the current state of being, and contact number of patients that should be recognized and sorted appropriately to shield it from abuse. This article presents a weight- based similarity (WBS) strategy to characterize the enormous information of health care data into two classifications like sensitive information and normal information. In the proposed method, the training dataset is utilized to sort information and it comprises of three fundamental advances like information extraction, mapping of information with the assistance of the training dataset, evaluation of the weight of input data with the threshold value to classify the data. The proposed strategy produces better outcomes with various assessment boundaries like precision, recall, F1 score, and accuracy value 92% to categorize the big data. Weka tool is utilized for examination among WBS and different existing order procedures. Keywords: Big data Classification Healthcare Sensitive data WBS This is an open access article under the CC BY-SA license. Corresponding Author: Snehalata K. Funde Department of Computer Science and Computer Engineering Koneru Lakshmaiah Education Foundation Vaddeswaram-522502, Guntur, Andhra Pradesh, India Email: snehalatafunde@gmail.com 1. INTRODUCTION Nowadays data in various fields like government organizations, health care systems, military, and banking sectors are growing exponentially. As the data is huge in amount, it needs to be stored in digital form. In earlier years, even though the size of data does not matter, still the inflow from where the information comes from and the structure of that information was restricted. In today’s world, the situation has changed and a big amount of data can be fetched from enormous sources and in a variety of its formats. The various tools like hadoop distributed file system (HDFS) and MapReduce are used to store such a big amount of data [1], [2]. The role of big data in the NoSQL database is in an enormous form to achieve high performance and accuracy over conventional databases. It is challenging to handle such a big amount of data using rows and columns format when the input data is in an unorganized format. NoSQL databases effectively handle such kinds of unstructured data [3]. One of the well-known sources of big data is logs generated through various web and desktop applications. The data coming from these sources are having types like well-organized data, unorganized data, or unstructured data. Big data having different phases, which are termed lifecycle phases of big data [4]. The first phase is the Collection of the input data from legal and authorized sources and gets it available as an input for the next phase of the lifecycle. The second phase is to store the collected data using trusted functions. This is the phase where the chances to get big data