A comparative study on principal component analysis and factor analysis for the formation of association rule in data mining domain Dharmpal Singh 1 , J.Pal Choudhary 2 , Malika De 3 1 Department of Computer Sc. & Engineering, JIS College of Engineering Block „A‟ Phase III, Kalyani, Nadia-741235, West Bengal, INDIA singh_dharmpal@yahoo.co.in 2 Department of Information Technology, Kalyani Govt. Engineering College, Kalyani, Nadia-741235, West Bengal, INDIA jnpc193@yahoo.com 3 Department of Engineering & Technological Studies, University of Kalyani Kalyani, Nadia-741235, West Bengal, INDIA demallika@yahoo.com Abstract: Association rule plays an important role in data mining. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data warehouse. Several authors have proposed different techniques to form the associations rule in data mining. But it has been observed that the said techniques have some problem. Therefore, in this paper an effort has been made to form the association using principal component analysis and factor analysis. A comparative study on the performance of principal component analysis and factor analysis has also been made to select the preferable model for the formation of association rule in data mining. A clustering technique with distance measure function has been used to compare the result of both the techniques. A new distance measure function named as Bit equal has been proposed for clustering and result has been compared with other exiting distance measure function. Keywords: Data mining, Association rule, Factor analysis, Principal component analysis, Cluster, K-means and Euclidian distance, Hamming distance. 1. Introduction: Data mining [1] is the process of extracting interesting (non- trivial, implicit, previously unknown and potentially useful) information or patterns from large information repositories such as: relational database, data warehouses, XML repository, etc. Also data mining is known as one of the core processes of Knowledge Discovery in Database (KDD). Association rule mining, one of the most important and well researched techniques of data mining, was first introduced in [2]. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data repositories. Association rules are widely used in various areas such as telecommunication networks, market and risk management, inventory control etc. Association rule mining has also been applied to e-learning systems for traditionally association analysis (finding correlations between items in a dataset), including, e.g., the following tasks: building recommender agents for on-line learning activities or 14 Enrique García, Cristóbal Romero, Sebastián Ventura and Toon Calders shortcuts [3], automatically guiding the learner‟s activities and intelligently generate and recommend learning materials [4], identifying attributes characterizing patterns of performance disparity between various groups of students [5], discovering interesting relationships from student‟s usage information in order to provide feedback to course author [6], finding out the relationships between each pattern of learner‟s behavior [7], finding students‟ mistakes that are often occurring together [8], guiding the search for best fitting transfer model of student learning [9], optimizing the content of an e-learning portal by determining the content of most interest to the user [10], extracting useful patterns to help educators and web masters evaluating and interpreting on-line course activities [3], and personalizing e-learning based on aggregate usage profiles and a domain ontology [11]. Association rule mining algorithms need to be configured before to be executed. So, the user has to give appropriate values for the parameters in advance (often leading to too many or too few rules) in order to obtain a good number of rules. A comparative study between the main algorithms that are currently used to discover association rules can be found in: Apriori [12], FP-Growth [13], MagnumOpus [14], and Closet [15]. Most of these algorithms require the user to set two thresholds, the minimal support and the minimal confidence, and find all the rules that exceed the thresholds specified by the user. Therefore, the user must possess a certain amount of expertise in order to find the right settings for support and confidence to obtain the best rules. Therefore an effort has been made to form the association rule using the principal component analysis and factor analysis. Markus Z¨oller[16] has discussed the ideas, assumptions and purposes of PCA (Principal component analysis) and FA. A comprehension and the ability to differ between PCA and FA have also been established. Hee-Ju Kim [17] has examined the differences between common factor analysis (CFA) and principal component analysis (PCA). Further the author has opined that CFA (Common factor analysis) provided a more accurate result as compared to the PCA (Principal component analysis). Diana D. Suhr [18] has discussed similarities and differences between PCA (Principal component analysis) and EFA. Examples of PCA (Principal component analysis) and EFA with PRINCOMP and FACTOR have also been illustrated and discussed. Several authors have used data mining techniques [2- 15] for association rule generation and the selection of best Advances in Applied and Pure Mathematics ISBN: 978-960-474-380-3 442