Influence of Various Clustering Algorithms on Web Personalization
B. Praveen, V. Ravi
Institute for Development and Research in Banking Technology, Castle Hills, Road #1,
Masab Tank, Hyderabad – 500 057 (AP) India
praveen.barli@gmail.com, rav_padma@yahoo.com
Abstract
Today many e-commerce websites are
incorporating personalization features to provide users
with relevant content based on their past browsing
behavior in order to make their browsing experience
better. In turn, site owners gain more loyal customers.
This paper compares various clustering algorithms
such as K-means, Fuzzy C-means, Subtractive
Clustering and K-modes, used for grouping of web
user sessions. The clusters formed as a result of
applying these algorithms are aggregated to form web
user profiles. The recommendation engine uses these
profiles, to generate pages for recommendation. The
recommendation effectiveness is evaluated using
standard measures such as coverage, precision and the
F1 measure.
1. Introduction
With the explosive growth of e-commerce sites and
the amount of unstructured information available on
these sites, it becomes difficult for users to access
relevant information efficiently and quickly. Web
Personalization is the process of providing users with
the relevant content, which may include web page
links, products etc., based on their past behavior. Many
memory and model based collaborative filtering
techniques are employed to do this task [12]. In model
based methods similar users are grouped together using
various clustering methods as an off-line process. Then
the aggregate profiles are used for finding similarity
with the active session and recommending pages from
the most similar profile, which are not yet seen by the
active user. Here we used many clustering algorithms
to form user profiles and compared their effectiveness
based on the accuracy of the recommendations
provided using these profiles. The individual profile
effectiveness is also calculated using the Weighted
Average Visit Percentage [WAVP] described in [4].
The recommendation engine used is the one proposed
in [4].
The rest of the paper is organized as follows:
Section 2 presents the methodology of web
personalization using web usage mining. Section 3
present’s results and discussion where in, the
effectiveness of the recommendations with each
clustering algorithm, in terms of standard measures
coverage, precision and F1 measure is discussed.
Section 4 concludes the paper.
2. Methodology
Web personalization using web usage mining is a
three-stage process [6] [13]:
1) Data preprocessing
2) Pattern discovery
3) Recommendation generation
2.1. Data preprocessing
Preprocessing is the primary task of personalization.
It involves data cleaning, user identification, session
identification, page view identification and transaction
identification [3]. The preprocessing step results in a
set of n user transactions and m page views. Here the
weights of page views are binary where 1 represents
that a particular page is accessed and 0 represents that
a particular page is not accessed. Thus each user
transaction is represented as a vector of m page view
weights.
2.2. Pattern discovery
As part of pattern discovery phase, many data
mining techniques such as clustering, association rule
mining, sequential pattern mining etc., are used. Their
primary task is to find out the hidden patterns, which
uncover the user behavior with respect to the site.
Here, we used clustering as a pattern discovery
technique, to form groups of users who exhibit similar
behavior in accessing the site content. The clustering
techniques employed include K-means clustering,
Proceedings of the International Workshop on Machine Intelligence Research (MIR Day, GHRCE- Nagpur)
© 2009 MIR Labs
32