Splitting-merging Clustering Algorithm for Collaborative Filtering Recommendation System Nabil Belacel 1 , Guillaume Durand 2 , Serge Leger 2 and Cajetan Bouchard 2 1 National Research Council, Information and Communication Technologies, Ottawa, Ontario, Canada 2 National Research Council, Information and Communication Technologies, Moncton, New Brunswick, Canada Keywords: Information Filtering, Recommender Systems, Collaborative Filtering, Clustering, Splitting-merging Clustering. Abstract: Collaborative filtering (CF) is a well-known and successful filtering technique that has its own limits, espe- cially in dealing with highly sparse and large-scale data. To address this scalability issue, some researchers propose to use clustering methods like K-means that has the shortcomings of having its performances highly dependent on the manual definition of its number of clusters and on the selection of the initial centroids, which leads in case of ill-defined values to inaccurate recommendations and an increase in computation time. In this paper, we will show how the Merging and Splitting clustering algorithm can improve the performances of recommendation with reasonable computation time by comparing it with K-means based approach. Our ex- periment results demonstrate that the performances of our system are independent on the initial partition by considering the statistical nature of data. More specially, results in this paper provide significant evidences that the proposed splitting-merging clustering based CF is more scalable than the well-known K-means clustering based CF. 1 INTRODUCTION In general, recommendation systems use mainly three types of filtering techniques: content based filtering (CBF), collaborative filtering (CF) and hybrid filte- ring (the combination of content based filtering and collaborative filtering)(Burke, 2002). The CBF re- commendation technique recommends specific items that are similar to those that have been already positi- vely rated in the past by the active user. CBF uses only the content of the items in order to make a recommen- dation (Pazzani and Billsus, 2007). The CF recom- mendation technique recommends items that were preferred in the past by similar users to the active user. CF techniques make the assumption that the active user will be interested in items appreciated by simi- lar users. Finally, the hybrid based filtering techni- ques recommend items by combining CF and content- based filtering (Burke, 2002). CF is widely used in the fields of e-commerce (Linden et al., 2003), e-learning (Bobadilla et al., 2009), e-government (Shambour and Lu, 2011), TV programs (Zhang et al., 2013), music (Cohen and Fan, 2000) and books (Benkoussas et al., 2014). Methods in CF can be either memory-based or model-based. Memory-based algorithms operate on the whole user-item rating matrix and make re- commendations by identifying the neighborhood of the target user to whom the recommendations will be made based on his preferences (Herlocker et al., 1999). The memory based filtering algorithms are easy to implement and they perform very well in many real world applications (Lu et al., 2015). Ho- wever, they face important problems limiting their ap- plications with sparse and/or large data. Data spar- sity is common when users rate only a small number of items creating a very sparse user-item matrix (Su and Khoshgoftaar, 2009). On the scalability side, the memory based filtering algorithms do not scale satis- factory when the users and items in ratings database increase (Lu et al., 2015). To solve this last scala- bility issue, model-based techniques were proposed. Model based techniques use machine learning algo- rithms on users-rating training data to learn a mo- del and to make predictions on the users-rating test data or on real data. Several algorithms have been used for model based CF. Among the machine lea- ning algorithms used, let’s list Bayesian networks (Su and Khoshgoftaar, 2006), matrix factorization (Bokde et al., 2015), probabilistic latent space models (Hof- mann and Puzicha, 1999), neural networks(Feng and Huiyou, 2006) and clustering methods(Salah et al., 2016). Clustering methods using K-means algorithm Belacel, N., Durand, G., Leger, S. and Bouchard, C. Splitting-merging Clustering Algorithm for Collaborative Filtering Recommendation System. DOI: 10.5220/0006599501650174 In Proceedings of the 10th International Conference on Agents and Artificial Intelligence (ICAART 2018) - Volume 2, pages 165-174 ISBN: 978-989-758-275-2 Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 165