International Journal of Computer Applications (0975 – 8887) Volume 121 – No.19, July 2015 42 Challenges in Big Data Application: A Review Satanand Mishra Scientist CSIR-AMPRI, Bhopal Vijay Dhote PG Scholar VNS Group, RGPV Bhopal G. S. Prajapati Asstt. Professor VNS Group, RGPV Bhopal J.P. Shukla Principal Scientist CSIR-AMPRI, Bhopal ABSTRACT New invention of advanced technology, enhanced capacity of storage media, maturity of information technology and popularity of social media, business intelligence and Scientific invention, produces huge amount of data which made ample set of information that is responsible for birth of new concept well known as big data. Big data analytics is the process of examining large amounts of data. The analysis is done on huge amount of data which is structure, semi structure and unstructured. In big data, data is generated at exponentially for reason of increase use of social media, email, document and sensor data. The growth of data has affected all fields, whether it is business sector or the world of science. In this paper, the process of system is reviewed for managing “Big Data” and today’s activities on big data tools and techniques. Keywords Big data, big data challenges and management, Hadoop, HDFS, Hadoop component. 1. INTRODUCTION Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. The primary sources for big data are from business applications, public web, social media, and sensor data and so on. Big data is currently a major topic of discussion across a number of fields, including management and marketing, scientific research, national security, government transparency and open data. Both public and private sectors are making increasing use of big data analytics. The need to process and analyze such massive datasets has introduced a new form of data analytics called Big Data Analytics. Big Data analytics involves analyzing large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information. Five years ago, only big business could afford to profit from big data: Wal-Mart and Google, specialized financial traders. Due to its specific nature of Big Data, it is stored in distributed file system architectures. Today, we are using an open source project called Hadoop, commodity Linux hardware and cloud computing, this power is in reach for everyone [1] [13] [15]. The Big Data is the combination of structured, semi-structured, unstructured, homogeneous and heterogeneous data. With the insertion of big data and massive scientific data sets that are measured in tens of peta bytes, a user has the relevant to transfer even larger amounts of data [35]. 1.1 Big data development- In big data development three basic pillars are used volume, velocity, variety. Data volume: Volume is the first and most flaming feature. In the year 2000, 800,000 petabytes of data were stored in the world. This number is expected to reach 35 zeta bytes by 2020.Social media generate around 10TB data every day. Big data are used to organize the large amount of data. The growth of data volume imposes numerous requirements on I/O performance, which results in I/O bottlenecks in current HEC (High End Computing) machines [14]. Data velocity: Velocity is refers to how quickly the data is generated and store, and its suitable rate of data processing and retrieval. Now a day data is more complex, data size in terms of hundreds of terabytes, petabytes, Exabyte’s etc. Complex data handling is difficult or impossible for traditional system. So many organizations must choose better analytical tool to effectively deal with big data. Data variety: Variety means different types of data. Today the increase use of smart devices, as well as social technology, sensor, data has became a large and complex because it does not only include traditional data but including some other data like structure data, semi structure, and unstructured data from different sources such as search engine, social media, web pages, sensor data, document etc[2] [3]. The characteristics of Big Data can be broadly divided into five Vs i.e. Volume, Velocity, Varity and Variability; Veracity. Volume refers to the size of the data It is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered Big Data or not. While Velocity tells about the speed at which data is generated; Varity and Variability tells us about the complexity and structure of data and different ways of interpreting it Variability is a factor which can be a problem for those who analyse the data., Veracity refers accuracy of analysis depends on the veracity of the source data. Fig 1 shows the development pillar of big data and characteristics of big data. Fig 1: Development pillar of big data and its characteristics