I.J. Intelligent Systems and Applications, 2017, 1, 75-84 Published Online January 2017 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2017.01.08 Copyright © 2017 MECS I.J. Intelligent Systems and Applications, 2017, 1, 75-84 High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework Guru Prasad M S SDMIT/CSE, Ujire, 577240, India E-mail: guru0927@gmail.co m Nagesh H R and Swathi Prabhu MITE/CSE, Moodbidri, 574227, India SMVITM/CSE, Udupi, 576115, India E-mail: nageshhrcs@reddifmail.com, prabhuswathi2@gmail.com Abstract —The Huge amount of Big Data is constantly arriving with the rapid development of business organizations and they are interested in extracting knowledgeable information from collected data. Frequent item mining of Big Data helps with business decision and to provide high quality service. The result of traditional frequent item set mining algorithm on Big Data is not an effective way which leads to high computation time. An Apache Hadoop MapReduce is the most popular data intensive distributed computing framework for large scale data applications such as data mining. In this paper, the author identifies the factors affecting on the performance of frequent item mining algorithm based on Hadoop MapReduce technology and proposed an approach for optimizing the performance of large scale frequent item set mining. The Experiments result shows the potential of the proposed approach. Performance is significantly optimized for large scale data mining in MapReduce technique. The author believes that it has a valuable contribution in the high performance computing of Big Data. Index Terms—Big Data, Hadoop, MapReduce, Hadoop Distributed File System (HDFS), Apriori MapReduce, FP-growth MapReduce. I. INT RODUCT ION We live in the Big Data Era. Big Data is a broad term that describes a massive volume of structured, semi- structured and unstructured data. Due to the advent of new technologies and digital world of data is expanded to 10 zettabytes (10 21 bytes). Huge amount of data is generated from social networking sites, e-commerce, on- line banking, weather stations, market transactions etc. Big Data is mainly characterized by 3 V’s extreme volumes, extreme variety and extreme velocity. Volume can vary beyond zettabytes. Velocity defines the speed at which data is generated and huge variety of data can be used. It is really critical to business enterprises and its emerging as one of the most important technologies in modern world. Many business enterprises accumulate large quantities of data from customer transactions; they handle more than one billion customer transaction every day. For example, EBay has 50 petabytes of data it captures 50 terabytes every day, US retailers have around 500 petabytes of data, Amazon is world's biggest retail store which has billions of active customer data etc. Huge amount of data continuously collected and stored in their data warehouse. Now business organizations are interested in extracting knowledgeable information from stored data. The information contained in transaction database is large. So it is very difficult to understand and also difficult to extract knowledgeable information from this huge dataset. To solve this problem the technique called frequent item mining is used. This technique finds the frequency of item purchased together. It is useful in extracting hidden predictive information from large data sets. And it is a powerful technology with extreme potential to help organizations. It focus on the most important information in their data warehouse. The frequent item mining technique predicts future trends and behaviors. It allows businesses to undertake proactive, knowledge-driven decisions. Apriori and FP-growth are the most famous algorithms to discover frequent patterns in large data sets. However, the existing data mining tools based on sequential Apriori and FP-growth algorithms are not efficient to mine a huge transaction data. We would require a robust distributed computing infrastructure that can store, manage and process huge amounts of data in short time. It can protect data security