Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 2, Issue. 4, April 2013, pg.513 – 516 RESEARCH ARTICLE © 2013, IJCSMC All Rights Reserved 513 IMPLEMENTATION OF PARALLEL APRIORI ALGORITHM ON HADOOP CLUSTER A. Ezhilvathani 1 , Dr. K. Raja 2 1 P.G Student, M.E CSE, Alpha College of Engg, Chennai, India 2 Dean (Academics), Alpha College of Engg, Chennai, India Abstract— Nowadays due to rapid growth of data in organizations, large scale data processing is a focal point of information technology. To deal with this advancement in data collection and storage technologies, designing and implementing large-scale parallel algorithm for Data mining is gaining more interest. In Data mining, association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. This paper aims to extract frequent patterns among set of items in the transaction databases or other repositories. Apriori algorithms have a great influence for finding frequent item sets using candidate generation. Apache Hadoop software framework is used to build the cluster. It working is based on MapReduce programming model. It is used to improve the processing of large- scale data on high performance cluster. It processes vast amount of data in parallel on large cluster of computer nodes. It provides reliable, scalable, distributed computing. Key Terms: - Hadoop; MapReduce; Apriori I. INTRODUCTION Data mining can be defined as the process of discovering hidden pattern in database. The main aim of the data mining is to manipulate the data into knowledge. Association rule mining is a kind of data mining process. Association rule mining is done to extract interesting correlations, patterns, associations among items in the transaction database or other data repositories. Association rules are widely used in various areas such as telecommunication networks, marketing and risk management, and inventory control etc. In this paper Apriori algorithm is used to find the frequent item set in database. This is the method for finding the set of all possible combination of items and then counts the support for them. The parallel association rule mining can be categorized in two sections [5,9]. The first is data parallelism in which the input data set could be divided among the participating node to generate the rules. The second method is of dividing the task among the nodes so that each node will access the whole input data set for generating the rules. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop was originally conceived on the basis of Google's MapReduce, in which an application is broken down into numerous small parts [10]. Hadoop can provide much needed robustness and scalability option to a distributed system as Hadoop provides inexpensive and reliable storage. The Apache Hadoop software library can detect and handle failures at the application layer, so it can deliver a highly-available service on top of a cluster of computers, each of which may be prone to failures. II. RELATED WORKS AND EXISTING MODEL The Nirali R, Sheth and J. S. Shah has implemented Association Rule based parallel data mining algorithm which deals with Hadoop cloud, a parallel store and computing platform [1].