[Singh, 3(2): February, 2014] ISSN: 2277-9655 Impact Factor: 1.852 http: // www.ijesrt.com(C)International Journal of Engineering Sciences & Research Technology [795-802] IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY One Time Mining by Multi-Core Preprocessing on Generalized Dataset Aviral Kumar Singh *1 , S. R. Tondon 2 , Tarun Dhar Diwan 3 aviralkumarsingh@live.com Abstract One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. Many industries are interested in developing the association rules from their databases due to continuous retrieval and storage of huge amount of data. The discovery of interesting association relationship among business transaction records in many business decision making process such as catalog decision, cross-marketing, and loss-leader analysis. The enormity and high dimensionality of datasets typically available as input to problem of association rule discovery, and the time consuming operation in this discovery process is the computation of the frequency of interesting subset of items (called candidates) in the database of transactions. Hence, it is has become vital to develop a method that will make speedup the preprocessing computation. In this paper, We have proposed An Integrated approach of Parallel Computing and ARM for mining Association Rules in Generalized data set that is fundamentally different from all the previous algorithms in that multi-core preprocessing is done and by avoiding recurring scan of dataset number of passes required is reduced. The response time is calculated on space delimited text dataset. Keywords: Data Mining, Association Rule Mining (ARM), Association rules, Apriori algorithm, Frequent pattern. Introduction The rapid development of computer technology, especially increased capacities and decreased costs of storage media, has led businesses to store huge amounts of external and internal information in large databases at low cost. Mining useful information and helpful knowledge from these large databases has thus evolved into an important research area [3, 2, 1]. Association rule mining (ARM) [18] has become one of the core data mining tasks and has attracted tremendous interest among data mining researchers. ARM is an undirected or unsupervised data mining technique which works on variable length data, and produces clear and understandable results. Association Rule Mining (ARM) algorithms [17] are defined into two categories; namely, algorithms respectively with candidate generation and algorithms without candidate generation. In the first category, those algorithms which are similar to Apriori algorithm for candidate generation are considered. Eclat may also be considered in the first category [8]. In the second category, the FP-Growth algorithm is the best–known algorithm. The main drawback of earlier algorithms is the repeated scans over large database. This may be a cause of decrement in CPU performance, memory and increment in I/O overheads. The performance and efficiency of ARM algorithms mainly depend on three factors; namely candidate sets generated, data structure used and details of implementations [8]. In this paper we have proposed an Algorithm which uses these three factors. Suppose if there are 104 frequent 1 itemsets, Apriori algorithm may produce 107 candidate 2 itemsets, count them and judge their frequency [11]. Besides, it may produce as many as 2100 (about 1030) candidate itemsets in order to find the frequent itemset which includes 100 items.What's more, it may scan the database many times to check a lager candidate through matching mode. Transactional database is considered as a two dimension array which works on generalized value dataset. The main difference between proposed algorithm and other algorithms is that instead of using transactional array in its natural form, our algorithm uses transpose of array i.e. rows and columns of array are interchanged and transposition is done using parallel matrix transpose algorithm (Mesh Transpose) [20]. The parallel architecture that lends itself most naturally to matrix operations is the mesh. Indeed, an n x n mesh of processors can be regarded as a matrix and is therefore perfectly fitted to accommodate an n x n data matrix, one element per processor. This is precisely the approach we shall use to compute the transpose of an n x n matrix initially stored in an n x n mesh of processors. We find that the time taken for matrix transpose decreases with an increase in the number of processors. We also observe that the speedup is very high for small as well as very large size of matrix when we increase the number of processors. The idea of our algorithm is quite simple.