CASIRJ Volume 11 Issue 11 [Year - 2020] ISSN 2319 – 9202 International Research Journal of Commerce Arts and Science http://www.casirj.com Page 11 Design of an Efficient Framework for Preserving Privacy in Big Data Mining Mrs. Vimla Dr. Pawan Kumar Research Scholar, Assistant Professor, Computer Science & Applications Computer Science & Applications NIILM University, Kaithal NIILM Universiry, Kaithal, INTRODUCTION You have probably heard the term “big data.” If you are wondering what big data means, you are not alone. Generally, big data refers to large data sets, collected by firms and governments that are so large and complex that traditional data processing methods are inadequate to deal with the calculations needed to make sense of the data. These data sets are extremely valuable because of the vast information hidden within the data structures. When analyzed computationally, big data can provide more precise insights into hidden patterns, trends, and associations, especially in the context of human decision making. The term big data was coined by Doug Laney in the early 2000s.1 Laney’s definition includes three concepts: 1. Volume: the type and detail of data being collected. Before the explosion in computing power, businesses and governments collected data but had a challenging time storing what was collected. Today, the volume of data collected from consumers and by agencies continues to grow, but because of computing capacity, storage is no longer an issue. This means that firms and agencies no longer have a data problem but instead have a computing puzzle. 2. Velocity: the speed at which data are collected. Data are no longer lagged. Instead, data are being collected in real time at incredibly fast rates. 3. Variety: the types of data being collected. Whereas basic demographic data, attitudes and opinions, and possibly geographic information might have been collected in the past, today nearly anything and everything a consumer does online is being captured. Since Laney’s original work, another concept has been added: veracity. This describes how much “noise” is in the data. Excessively large amounts of data can make it difficult to identify which data are important and which data are distractions. With large amounts of data come opportunities and challenges. Historically, one problem faced by researchers concerns the analytical and statistical techniques available to analyze data. Few of the statistical methodologies that were developed in the 19th and mid-20th centuries can handle the complex nature of very large data sets. Another problem is that the volume and variety of data collection today are so large and so fast that traditional data analysis methods are inadequate to handle the constant influx of information.