IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 05 Issue: 05 | May-2016, Available @ http://ijret.esatjournals.org 100 CREDIT CARD DATA PROCESSING AND E-STATEMENT GENERATION WITH USE OF HADOOP Ashvini A.Mali 1 , N. Z. Tarapore 2 1 Research Scholar, Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, Maharashtra ashvini63@gmail.com 2 Assistant Professor, Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, Maharashtra ntarapore@yahoo.com Abstract The Bigdata for most of part is known as virtually wealthy storage and processing .This paper is related to basic knowledge of Bigdata and Hadoop environment. Now a day's various organizations deal with problems like data sharing ,data privacy and secure data transfer where they take lots of time for all this work , so to avoid this we are using Hadoop. Hadoop is used for large storage and parallel processing , we can say it also works in distributed fashion. There are different hadoop tools available like hive, sqoop, pig and more which we will discuss later. Keywords: Bid Data, Hadoop, Parallel Processing, Map-Reduce, Hdfs --------------------------------------------------------------------***---------------------------------------------------------------------- I. INTRODUCTION Big organizations like Infosys, cognizant and more on they have big data analysts ,who analyze global market and supply chains and also analyze insights of customer demands from various information collected on business transactions and market needs. This data which is used to analyze is collected from various sources like newspapers, social sites , business blogs, discussion forums etc[3]. After collection analysts check this data from various ways and makes decisions which are useful to improve supply chains and customer behavior and also effective toward enhancing business. Every organization has rapid growing data day by day and it becomes tough for the system to process and respond for every query or function. It is difficult for the bunch of data in single drive makes reading and writing slower for a system in-charge. Hadoop is an solution of common storage and parallel processing[2] . The scope of using Hadoop is, it can handle large volumes of data and more efficient in handling data losses which is more important for all sectors and even fair enough for processing the data faster. All the sectors like financial, healthcare ,business are trying to use hadoop for faster performance and to reduce workload. II. RELATED WORK The Big organizations, health care centers and more financial sectors like bank all deal with large data .so to process this data our normal processing and database does not respond well ,so hadoop comes in this situation. where it deals with big data and processing . There are different tools available in market which are useful like HBase , Hive , Pig , sqoop ,flume etc etc . The Hive and PIG both are used to work with big scale data .working at small data it will affect on time using this tools but if you are dealing with big scale data then it gives fabulous results, also other tools like sqoop and flume used for data import and export purpose or for data collection .How this all work will be described in section IV later. III. BIG DATA CONCEPT Big data is similar to "small data " but it is bigger than small one. we are part of it every day, which is lengthy to process n store, but data having bigger requires different approaches to handle this large data .There are different approaches used like technologies, tools and architectures, which are used to solve new problem also old problems with better way. We all are using smart phones, tablets ,camera and more on social networking sites like facebook, twitter etc ,which generates large volume of data which needed to be stored and processed simultaneously as per our or customer demands. This large data yield in petabytes or terabytes so to store this data require large storage space so hadoop comes into this situation to solve storage and processing problem. Big data is not only related to size but also relates with other data analysis and processing norms. The enhancement of this big data is achieved through the 3 v's , which are nothing but variety, velocity and volume. Volume :- The amount of data we collect and store like environmental data, entertainment data , event data and more on ,so data is either structured or unstructured. Most of the relational databases support structured data. Variety :- Data is in different types for example text ,image, video, audio .This makes complex to handle this data. for e.g. Time required to retrieve text data is less than other types. Velocity:- Data is growing faster ,which need to be handled.