Citation: Spiliotopoulos, D.;
Margaris, D.; Vassilakis, C. On
Exploiting Rating Prediction
Accuracy Features in Dense
Collaborative Filtering Datasets.
Information 2022, 13, 428. https://
doi.org/10.3390/info13090428
Academic Editor: Ida Mele
Received: 1 August 2022
Accepted: 8 September 2022
Published: 11 September 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
information
Article
On Exploiting Rating Prediction Accuracy Features in Dense
Collaborative Filtering Datasets
Dimitris Spiliotopoulos
1
, Dionisis Margaris
2,
* and Costas Vassilakis
3
1
Department of Management Science and Technology, University of the Peloponnese,
Akadimaikou G. K. Vlachou, 22131 Tripoli, Greece
2
Department of Digital Systems, University of the Peloponnese, Valioti’s Building, Kladas, 23100 Sparta, Greece
3
Department of Informatics and Telecommunications, University of the Peloponnese,
Akadimaikou G. K. Vlachou, 22131 Tripoli, Greece
* Correspondence: margaris@uop.gr
Abstract: One of the typical goals of collaborative filtering algorithms is to produce rating predictions
with values very close to what real users would give to an item. Afterward, the items having the
largest rating prediction values will be recommended to the users by the recommender system.
Collaborative filtering algorithms can be applied to both sparse and dense datasets, and each of
these dataset categories involves different kinds of risks. As far as the dense collaborative filtering
datasets are concerned, where the rating prediction coverage is, most of the time, very high, we
usually face large rating prediction times, issues concerning the selection of a user’s near neighbours,
etc. Although collaborative filtering algorithms usually achieve better results when applied to dense
datasets, there is still room for improvement, since in many cases, the rating prediction error is
relatively high, which leads to unsuccessful recommendations and hence to recommender system
unreliability. In this work, we explore rating prediction accuracy features, although in a broader
context, in dense collaborative filtering datasets. We conduct an extensive evaluation, using dense
datasets, widely used in collaborative filtering research, in order to find the associations between
these features and the rating prediction accuracy.
Keywords: collaborative filtering; recommender systems; personalisation; dataset density; rating
prediction accuracy; accuracy features; evaluation
1. Introduction
One of the most widely applied recommender system (RS) methods, over the last
20 years, is collaborative filtering (CF) [1,2]. The typical goal of a CF algorithm is to produce
rating predictions for products or services that users have not already evaluated. The closer
these rating predictions are to the rating values that the users themselves would give to
these products or services, the higher accuracy the CF algorithm will have.
Afterwards, based on the aforementioned rating predictions, a CF RS will typically
recommend, to each user, the products or services scoring higher rating prediction values.
These products carry the highest probability, among all products or services, that the user
will actually like them and hence accept the recommendation (by clicking the product
advertisement, buying the product or service, etc.) [3,4].
The first step of a typical CF system is to locate the ‘near neighbours’ (NNs) for each of
its users. An NN of user u is another user v who shares similar likings with u. This can be
found by taking the stored real ratings of users u and v, set_of_ratings
u
, and set_of_ratings
v
,
finding the ones given to common products or services i (i.e., the intersection of the two
sets), and comparing them. If the majority of them are (to a large extent) similar, then these
users are NNs with each other [5,6]. Typically, in modern CF RSs, the aforementioned task
is implemented using a user vicinity metric, such as the Pearson correlation coefficient
Information 2022, 13, 428. https://doi.org/10.3390/info13090428 https://www.mdpi.com/journal/information