International Journal of Computer Applications (0975 888) Volume 47No.3, June 2012 12 A Study on Milestones of Association Rule Mining Algorithms in Large Databases Saravanan Suba Assistant Professor of Computer Science Government Arts College Paramakudi,TN,India Chistopher.T, PhD. Assistant Professor of Computer Science&Head Government Arts College Udumalpet,TN,India ABSTRACT Data mining helps in doing automated extraction and generating predictive information from large amount of data. The association rule mining is one of the important area of research in Data mining. The Association rule mining identifies the useful associations or relationship among big set of data items. In this paper, we provide the important concepts of Association rule mining and existing algorithms and their effectiveness and drawbacks. The references provided in this paper covered the main theoretical issues and guiding the researcher in an interesting research directions that have yet to be discovered. Keywords Data Mining, Association Rule Mining, Apriori, FP-growth, Frequent Item sets 1. INTRODUCTION Nowadays there has been an exponential growth in the generation and manipulation of electronic information as more and more operations are computerized. Any organization or enterprise have started to realize that the information accumulated over years is an important strategic asset and they also realize that there is potential business intelligence hidden in the large amount of data. For that, what these enterprises want is a technique that permits them to extract the most valuable information from accumulated data. The field of data mining offers such techniques which evaluating the current data and inferring hidden information that would be useful in future prediction, pattern recognition and decision making [1]. Data mining is a collection of techniques for effective automated discovery of formerly unknown, valid, novel, useful and understandable pattern in large databases [1]. The pattern must be usable so that they can be used in the enterprise’s decision making process. Data mining is also seen as an important step in the overall process of knowledge discovery which composed of various segments. Data Cleaning frees noise and other inconsistent data which are present in the input database. Since the input data base could be composed of data from multiple sources, Data Integration is employed to merge data from different sources. This is otherwise called Data Warehouse. Data Selection phase finds the specific data mining task relevant data in the input data base. Data Transformation phase which transfer the input data into format suitable for data mining. The specific data mining task that employs clever methods or algorithm for mining is carried out. Next, the interesting pattern will be selected from set of pattern mined from previous step. The last step of KDD (Knowledge Discovery In Data Base) process is the Presentation of the Discovered Knowledge in the user friendly format [2]. The Data Mining techniques or tasks can be generally classified as descriptive or predictive. Descriptive mining refers to the method in which the essential characteristics or general properties of the data in the data base are depicted. The descriptive techniques involve task like Clustering, Association and Sequential Mining[3]. Predictive data mining tasks are those that perform inference on input data to arrive at hidden knowledge and make interesting and useful prediction[2]. The predictive mining techniques involve tasks like Classification, Regression and Deviation[3]. Data Mining is motivated by decision support problems faced by most business organizations and is described as essential area of research [4]. Key research issues or challenges in Data Mining are performance, mining methodology, user interaction and data diversity . So the data mining algorithm and methodologies must be competent and scalable well to the size of data base and their execution times[2]. One of the most popular descriptive Data Mining techniques is association rule mining [3]. Since its introduction[5], Association Rule Mining has become one of the core Data Mining tasks and has attracted tremendous interest among Data Mining researches and practitioners[6]. Association Rule Mining could be decomposed into two sub problems, mining large item set (i.e. frequent item sets) and the generation of association rules[2]. Two statistical measures that control the process of association rule mining are support and confidence. For an association rule XY and number of transactions is denoted as N, the support and confidence can be mathematically represented as follows Support(XY)=∑(XUY)/N and Confidence(XY)=∑(XUY)/ ∑X. The entire process of association and pattern mining is controlled by user specific parameter, namely minimum support and confidence[2]. The Association Rule Mining(ARM) task was first presented by Agrawal et al. [5] to discover interesting relationships among items in market basket transactions. Since its inception, extensive studies have been being conducted to address various conceptual, implementation and application issues pertaining to the association analysis task. Research in conceptual issues in Association Rule Mining is focused primarily on developing a framework to define the theoretical underpinnings of association analysis and extending the formulation to handle new type pattern. Research in implementing issues in ARM involve mixing the mining capability into existing data base technology, developing competent and scalable mining algorithm, handling user specified or domain specific constraints and post processing the extracted pattern. Research in application issues in Association Rule Mining includes marketing,