IJSRSET1732191 | 09 April 2017 | Accepted : 19 April 2017 | March-April-2017 [(2)2: 648-660]
© 2017 IJSRSET | Volume 3 | Issue 2 | Print ISSN: 2395-1990 | Online ISSN : 2394-4099
Themed Section: Engineering and Technology
648
Information Security and Data Mining in Big Data
Tejas P. Adhau*
1
, Prof. Dr. Mahendra A. Pund*
2
Department of Computer Science of Engineering/SGBAU University/PRMIT Badnera/Amravati, Maharashtra,
India
ABSTRACT
The growing popularity and development of data mining technologies bring serious threat to the security of
individual's sensitive information. An emerging research topic in data mining, known as privacy-preserving data
mining (PPDM), has been extensively studied in recent years. The basic idea of PPDM is to modify the data in
such a way so as to perform data mining algorithms effectively without compromising the security of sensitive
information contained in the data. Current studies of PPDM mainly focus on how to reduce the privacy risk
brought by data mining operations, while in fact, unwanted disclosure of sensitive information may also happen in
the process of data collecting, data publishing, and information (i.e., the data mining results) delivering. In this
paper, we view the privacy issues related to data mining from a wider perspective and investigate various
approaches that can help to protect sensitive information. In particular, we identify four different types of users
involved in data mining applications, namely, data provider, data collector, data miner, and decision maker. For
each type of user, we focus on his privacy and how to protect sensitive information.
Keywords: Data Mining, Sensitive Information, Privacy-Preserving Data Mining Provenance, Anonymization ,
Privacy Auction, Antitracking.
I. INTRODUCTION
Data mining has attracted more and more attention in recent
years, probably because of the popularity of the``big data''
concept. Data mining is the process of examining large pre-
existing databases in order to generate new information and
the result gives direction to guide future activities. Data
mining process is also used for the analysis of data for
relationships that have not previously been discovered. The
term data warehouse is used to store a database that is used
for analysis. Warehouse should be able to tell you what type
of data they want to view and at what levels relationships
among data items they want to be able to view it.
II. METHODS AND MATERIAL
1. The Process of KDD
Generally three of the major data mining techniques are
regression, classification and clustering. Data Mining
also popularly known as Knowledge Discovery in
Databases (KDD) [1] [2]. KDD widely used data
mining technique is a process that includes data
preparation, selection, and generate result patterns.
Some issues involved in the entire KDD process are:
Identify the goal of the KDD process.
Understand application domain involved an the
knowledge that's required. Select data set on which
discovery is be performed.
Alter the data as per the requirements.
Simplify the data sets by removing unwanted
variables and missing fields
Match KDD goals with data mining methods to
suggest hidden patterns. Choose data mining
algorithms to discover hidden patterns.
Search for patterns of interest in a particular
representational form, which include classification
rules or trees, regression and clustering.
Interpret essential knowledge from the mined
patterns.