ISSN (Online) 2278-1021 ISSN (Print) 2319 5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 10, October 2015 Copyright to IJARCCE DOI 10.17148/IJARCCE.2015.41049 230 A Study on Evolution of Data in Traditional RDBMS to Big Data Analytics Surajit Mohanty 1 , Kedar NathRout 2 , Shekharesh Barik 3 , Sameer Kumar Das 4 Asst. Prof., Computer Science & Engineering, DRIEMS, Cuttack, India 1, 2, 3 Asst. Prof., Computer Science & Engineering, GATE, Berhampur, India 4 Abstract: The volume of data that enterprise acquires every day is increasing rapidly. The enterprises do not know what to do with the data and how to extract information from this data. Analytics is the process of collecting, organizing and analysing large set of data that is important for the business. The process of analysing and processing this huge amount of data is called bigdata analytics. The volume, variety and velocity of big data cause performance problems when processed using traditional data processing techniques. It is now possible to store and process these vast amounts of data on low cost platforms such as Hadoop. The major aspire of this paper is to make a study on data analytics, big data and its applications. Keywords: BigData, Hadoop, MapReduce, Sqoop and Hive. I. INTRODUCTION The volume of data that enterprise acquires every day is increasing rapidly. In this way Traditional RDBMS fails to store huge amount of data. Up to GB of Data can be Stored in different verities of RDBMSs. It is not recommended to use RDBMS if volume of data increases to hexa byte of things. Even though It deals with GB of data, still it provides degradation of performance. Seek time is improving more slowly than transfer rate. Seeking is the process of moving the disk’s head to a particular place on the disk to read or write data. It characterizes the latency of a disk operation, whereas the transfer rate corresponds to a disk’s bandwidth. If the data access pattern is dominated by seeks, it will take longer to read or write large portions of the dataset than streaming through it, which operates at the transfer rate. On the other hand, for updating a small proportion of records in a database, a traditional B-Tree (the data structure used in relational databases, which is limited by the rate it can perform, seeks) works well. For updating the majority of a database, a B-Tree is less efficient than MapReduce, which uses Sort/Merge to rebuild the database. MapReduce can be seen as a complement to an RDBMS. II. PRODUCTION OF BIG DATA Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities. Now A days to handle big data, our traditional RDBMS fails, not able to store this large volume of data. [1] So Hadoop is the solution. So Bigdata is the problem and Hadoop is the solution. In other words it can be told as Bigdata is the issue, Hadoop is the implementation. For example Google is producing every day data up to more than 12 PB like Facebook providing 10 PB, ebay producing 8 PB per day. For storing and processing of large volume of data we need use Hadoop as Framework. Hadoop is a framework used for storing and processing of large volume of data. Whereas traditional RDBMS can only store data, not able to process the data. For this we need to write more complex logic by following any programming Language. It’s too tedious to write code for the same. III. EVOLUTION OF MAPREDUCE TO PROGRAMMING LANGUAGE MapReduce is a good fit for problems that need to analyse the whole dataset, in a batch fashion, particularly for ad hoc analysis. An RDBMS is good for point queries or updates, where the dataset has been indexed to deliver low-latency retrieval and update times of a relatively small amount of data. MapReduce suits applications where the data is written once, and read many times, whereas a relational database is good for datasets that are continually updated. [2] Another difference between MapReduce and an RDBMS is the amount of structure in the datasets that they operate on. Structured data is data that is organized into entities that have a defined format, such as XML documents or database tables that conform to a particular predefined schema. This is the realm of the RDBMS. Semi-structured data, on the other hand, is looser, and though there may be a schema, it is often ignored, so it may be used only as a guide to the structure of the data: for example, a spreadsheet, in which the structure is the grid of cells, although the cells themselves may hold any form of data. Unstructured data does not have any particular internal structure: for example, plain text or image data. MapReduce works well on unstructured or semi structured data, since it is designed to interpret the data at processing time. In other words, the input keys and values for MapReduce are not an intrinsic property of the data, but they are chosen by the person analysing the data. [5]