On the effect of data sparsity to the performance of a Collaborative Filtering algorithm on a GPU Efthalia Karydi 1, 2 , Konstantinos G. Margaritis 1 , and Eero Vainikko 2 1 University of Macedonia, Department of Applied Informatics Parallel and Distributed Processing Laboratory. 156 Egnatia str., P.O. Box 1591, 54006 Thessaloniki, Greece Karydithalia@gmail.com kmarg@uom.gr 2 University of Tartu, Institute of Computer Science Faculty of Mathematics and Computer Science. Liivi 2, Tartu 50409, Estonia eero.vainikko@ut.ee Abstract. One of the problems that are encountered in recommender systems applications is the high sparsity of the available data. In this paper we investigate the effect of the sparsity of datasets to the perfor- mance of a parallel implementation of the Collaborative Filtering Slope One algorithm. To represent the sparse data the Compressed Sparse Row (CSR) format is used and the implementation’s performance is evaluated on a Graphics Processing Unit using the MovieLens and artificially cre- ated datasets. Keywords: Collaborative Filtering, Slope One, CSR Format, Massively Parallel Computing, GPU, CUDA 1 Introduction Collaborative Filtering is a very popular algorithmic family used for recommen- dations. It uses the known information that is provided to the system by its users in order to predict the unknown information of other users. The number of users who utilize recommender systems is continuously growing, as is their demand to receive accurate results fast. Thus, finding a solution to process faster as much data as possible has become an urgent need. The usage of parallel computing technologies is crucial in order to accelerate the performance of collaborative filtering algorithms. Collaborative filtering recommender systems’ accuracy is affected by the amount of users whose information participates to the recommendation. The greater the number of users, the better will be the recommendations. For this reason, is important to use a compressed format to represent the data, since it will allow the processing of significantly larger amount of data. The data used in collaborative filtering are highly sparse. Thus, using a compressed format will not only allow the usage of more data, but will also improve the performance of the implementation, since the GPU will access coalesced memory. ICT Innovations 2015 Web Proceedings ISSN 1857-7288 173 S. Loshkovska, S. Koceski (Editors): ICT Innovations 2015, Web Proceedings, ISSN 1857-7288 © ICT ACT http://ictinnovations.org/2015, 2015