International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: 2347- 5552, Volume-1, Issue-2, September, 2013 39 Big Data: The New Challenges in Data Mining Mrs. Deepali Kishor Jadhav Abstract—Big Data is a new term used to identify the datasets but due to their large size and complexity, we cannot manage them with our current methodologies or data mining software tools. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. Big Data mining is the capability of extracting useful information from these large datasets or streams of data, that due to its volume, variability, and velocity, it was not possible before to do it. The Big Data challenge is becoming one of the most exciting opportunities for the next years. This paper represents a broad overview of the topic, Big Data challenges, Data Mining Challenges with Big Data, Big Data processing framework and forecast to the future. Index Terms— Big Data, New Challenges, Data Mining, Future Challenges, Big Data Problems. I. INTRODUCTION Big Data is too big, too fast, or too hard for existing tools to process [1]. Here, “too big” means that organizations increasingly must deal with petabyte-scale collections of data that come from click streams, transaction histories, sensors, and elsewhere. “Too fast” means that not only is data big, but it must be processed. “Too hard” is a catchall for data that doesn’t fit neatly into an existing processing tool or that needs some kind of analysis that existing tools can’t readily provide. Big Data is currently defined using three data characteristics: volume, variety and velocity [2]. At some point when the volume, variety and velocity of the data are increased, the current techniques and technologies may not be able to handle storage and processing of the data. At that point the data is defined as Big Data. The term Big Data Analytics is nothing but the process of analyzing and understanding the characteristics of massive size datasets by extracting useful geometric and statistical patterns. These three characteristics of a dataset increase the complexity of the data. Many applications involve the Big Data problem, including network traffic risk analysis, geospatial classification and business forecasting. Network intrusion detection and prediction are time sensitive applications and they require highly efficient Big Data techniques and technologies to tackle the problem. In this paper some of the problems and challenges associated with the Big Data technologies and techniques are discussed. The current definition of Big Data defined on a 3D space, V 3 , formed by three parameters, volume, variety and velocity cannot provide a suitable platform for the early detection of Big Data characteristics for Big Data classification. Figure 1 shows the 3D space defined for Big V 3 , formed by three Manuscript received November 19, 2013. Mrs. Deepali Kishor Jadhav, Assistant professor, Department of Computer Science and Engineering, K.I.T.’s College of Engineering, Kolhapur. Maharahtra, India.(e-mail:Deepkjadhav80@gmail.com) parameters, volume, variety and velocity cannot provide a suitable platform for the early detection of Big Data characteristics for Big Data classification. Figure 1 shows the 3D space defined for Big Data, where the axis of volume represents the growth of data size, the axis of velocity represents the increase in speed in which the data must be processed, and the axis of variety represents the increase in various types of data. Figure 1: Current definition (V 3 ) of Big Data characteristics A new definition for Big Data as a 3D space, C 3 , as shown in Figure 2, which is defined based on three new parameters: cardinality, continuity, and complexity. In C 3 space the cardinality defines the number of records in the dynamically growing dataset at a particular instance. The continuity defines two characteristics and they are: (i) representation of data by continuous functions, and (ii) continuously growth of data size with respect to time. The complexity defines three characteristics and they are: (i) large varieties of data types, (ii) high dimensional dataset; and (iii) the speed of data processing are very high[3]. Figure 2: Proposed definition (C 3 ) of Big Data characteristics Volume Variety Velocity Cardinality Continuity Complexity