Splitting-merging Clustering Algorithm for Collaborative Filtering
Recommendation System
Nabil Belacel
1
, Guillaume Durand
2
, Serge Leger
2
and Cajetan Bouchard
2
1
National Research Council, Information and Communication Technologies, Ottawa, Ontario, Canada
2
National Research Council, Information and Communication Technologies, Moncton, New Brunswick, Canada
Keywords: Information Filtering, Recommender Systems, Collaborative Filtering, Clustering, Splitting-merging
Clustering.
Abstract: Collaborative filtering (CF) is a well-known and successful filtering technique that has its own limits, espe-
cially in dealing with highly sparse and large-scale data. To address this scalability issue, some researchers
propose to use clustering methods like K-means that has the shortcomings of having its performances highly
dependent on the manual definition of its number of clusters and on the selection of the initial centroids, which
leads in case of ill-defined values to inaccurate recommendations and an increase in computation time. In this
paper, we will show how the Merging and Splitting clustering algorithm can improve the performances of
recommendation with reasonable computation time by comparing it with K-means based approach. Our ex-
periment results demonstrate that the performances of our system are independent on the initial partition by
considering the statistical nature of data. More specially, results in this paper provide significant evidences that
the proposed splitting-merging clustering based CF is more scalable than the well-known K-means clustering
based CF.
1 INTRODUCTION
In general, recommendation systems use mainly three
types of filtering techniques: content based filtering
(CBF), collaborative filtering (CF) and hybrid filte-
ring (the combination of content based filtering and
collaborative filtering)(Burke, 2002). The CBF re-
commendation technique recommends specific items
that are similar to those that have been already positi-
vely rated in the past by the active user. CBF uses only
the content of the items in order to make a recommen-
dation (Pazzani and Billsus, 2007). The CF recom-
mendation technique recommends items that were
preferred in the past by similar users to the active user.
CF techniques make the assumption that the active
user will be interested in items appreciated by simi-
lar users. Finally, the hybrid based filtering techni-
ques recommend items by combining CF and content-
based filtering (Burke, 2002). CF is widely used in the
fields of e-commerce (Linden et al., 2003), e-learning
(Bobadilla et al., 2009), e-government (Shambour and
Lu, 2011), TV programs (Zhang et al., 2013), music
(Cohen and Fan, 2000) and books (Benkoussas et al.,
2014). Methods in CF can be either memory-based
or model-based. Memory-based algorithms operate
on the whole user-item rating matrix and make re-
commendations by identifying the neighborhood of
the target user to whom the recommendations will
be made based on his preferences (Herlocker et al.,
1999). The memory based filtering algorithms are
easy to implement and they perform very well in
many real world applications (Lu et al., 2015). Ho-
wever, they face important problems limiting their ap-
plications with sparse and/or large data. Data spar-
sity is common when users rate only a small number
of items creating a very sparse user-item matrix (Su
and Khoshgoftaar, 2009). On the scalability side, the
memory based filtering algorithms do not scale satis-
factory when the users and items in ratings database
increase (Lu et al., 2015). To solve this last scala-
bility issue, model-based techniques were proposed.
Model based techniques use machine learning algo-
rithms on users-rating training data to learn a mo-
del and to make predictions on the users-rating test
data or on real data. Several algorithms have been
used for model based CF. Among the machine lea-
ning algorithms used, let’s list Bayesian networks (Su
and Khoshgoftaar, 2006), matrix factorization (Bokde
et al., 2015), probabilistic latent space models (Hof-
mann and Puzicha, 1999), neural networks(Feng and
Huiyou, 2006) and clustering methods(Salah et al.,
2016). Clustering methods using K-means algorithm
Belacel, N., Durand, G., Leger, S. and Bouchard, C.
Splitting-merging Clustering Algorithm for Collaborative Filtering Recommendation System.
DOI: 10.5220/0006599501650174
In Proceedings of the 10th International Conference on Agents and Artificial Intelligence (ICAART 2018) - Volume 2, pages 165-174
ISBN: 978-989-758-275-2
Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
165