Influence of Various Clustering Algorithms on Web Personalization B. Praveen, V. Ravi Institute for Development and Research in Banking Technology, Castle Hills, Road #1, Masab Tank, Hyderabad – 500 057 (AP) India praveen.barli@gmail.com, rav_padma@yahoo.com Abstract Today many e-commerce websites are incorporating personalization features to provide users with relevant content based on their past browsing behavior in order to make their browsing experience better. In turn, site owners gain more loyal customers. This paper compares various clustering algorithms such as K-means, Fuzzy C-means, Subtractive Clustering and K-modes, used for grouping of web user sessions. The clusters formed as a result of applying these algorithms are aggregated to form web user profiles. The recommendation engine uses these profiles, to generate pages for recommendation. The recommendation effectiveness is evaluated using standard measures such as coverage, precision and the F1 measure. 1. Introduction With the explosive growth of e-commerce sites and the amount of unstructured information available on these sites, it becomes difficult for users to access relevant information efficiently and quickly. Web Personalization is the process of providing users with the relevant content, which may include web page links, products etc., based on their past behavior. Many memory and model based collaborative filtering techniques are employed to do this task [12]. In model based methods similar users are grouped together using various clustering methods as an off-line process. Then the aggregate profiles are used for finding similarity with the active session and recommending pages from the most similar profile, which are not yet seen by the active user. Here we used many clustering algorithms to form user profiles and compared their effectiveness based on the accuracy of the recommendations provided using these profiles. The individual profile effectiveness is also calculated using the Weighted Average Visit Percentage [WAVP] described in [4]. The recommendation engine used is the one proposed in [4]. The rest of the paper is organized as follows: Section 2 presents the methodology of web personalization using web usage mining. Section 3 present’s results and discussion where in, the effectiveness of the recommendations with each clustering algorithm, in terms of standard measures coverage, precision and F1 measure is discussed. Section 4 concludes the paper. 2. Methodology Web personalization using web usage mining is a three-stage process [6] [13]: 1) Data preprocessing 2) Pattern discovery 3) Recommendation generation 2.1. Data preprocessing Preprocessing is the primary task of personalization. It involves data cleaning, user identification, session identification, page view identification and transaction identification [3]. The preprocessing step results in a set of n user transactions and m page views. Here the weights of page views are binary where 1 represents that a particular page is accessed and 0 represents that a particular page is not accessed. Thus each user transaction is represented as a vector of m page view weights. 2.2. Pattern discovery As part of pattern discovery phase, many data mining techniques such as clustering, association rule mining, sequential pattern mining etc., are used. Their primary task is to find out the hidden patterns, which uncover the user behavior with respect to the site. Here, we used clustering as a pattern discovery technique, to form groups of users who exhibit similar behavior in accessing the site content. The clustering techniques employed include K-means clustering, Proceedings of the International Workshop on Machine Intelligence Research (MIR Day, GHRCE- Nagpur) © 2009 MIR Labs 32