On the effect of data sparsity to the performance
of a Collaborative Filtering algorithm on a GPU
Efthalia Karydi
1, 2
, Konstantinos G. Margaritis
1
, and Eero Vainikko
2
1
University of Macedonia, Department of Applied Informatics
Parallel and Distributed Processing Laboratory.
156 Egnatia str., P.O. Box 1591, 54006 Thessaloniki, Greece
Karydithalia@gmail.com
kmarg@uom.gr
2
University of Tartu, Institute of Computer Science
Faculty of Mathematics and Computer Science.
Liivi 2, Tartu 50409, Estonia
eero.vainikko@ut.ee
Abstract. One of the problems that are encountered in recommender
systems applications is the high sparsity of the available data. In this
paper we investigate the effect of the sparsity of datasets to the perfor-
mance of a parallel implementation of the Collaborative Filtering Slope
One algorithm. To represent the sparse data the Compressed Sparse Row
(CSR) format is used and the implementation’s performance is evaluated
on a Graphics Processing Unit using the MovieLens and artificially cre-
ated datasets.
Keywords: Collaborative Filtering, Slope One, CSR Format, Massively
Parallel Computing, GPU, CUDA
1 Introduction
Collaborative Filtering is a very popular algorithmic family used for recommen-
dations. It uses the known information that is provided to the system by its users
in order to predict the unknown information of other users. The number of users
who utilize recommender systems is continuously growing, as is their demand to
receive accurate results fast. Thus, finding a solution to process faster as much
data as possible has become an urgent need. The usage of parallel computing
technologies is crucial in order to accelerate the performance of collaborative
filtering algorithms.
Collaborative filtering recommender systems’ accuracy is affected by the
amount of users whose information participates to the recommendation. The
greater the number of users, the better will be the recommendations. For this
reason, is important to use a compressed format to represent the data, since it
will allow the processing of significantly larger amount of data. The data used
in collaborative filtering are highly sparse. Thus, using a compressed format will
not only allow the usage of more data, but will also improve the performance of
the implementation, since the GPU will access coalesced memory.
ICT Innovations 2015 Web Proceedings ISSN 1857-7288
173
S. Loshkovska, S. Koceski (Editors): ICT Innovations 2015, Web Proceedings, ISSN 1857-7288
© ICT ACT http://ictinnovations.org/2015, 2015