Citation: Spiliotopoulos, D.; Margaris, D.; Vassilakis, C. On Exploiting Rating Prediction Accuracy Features in Dense Collaborative Filtering Datasets. Information 2022, 13, 428. https:// doi.org/10.3390/info13090428 Academic Editor: Ida Mele Received: 1 August 2022 Accepted: 8 September 2022 Published: 11 September 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). information Article On Exploiting Rating Prediction Accuracy Features in Dense Collaborative Filtering Datasets Dimitris Spiliotopoulos 1 , Dionisis Margaris 2, * and Costas Vassilakis 3 1 Department of Management Science and Technology, University of the Peloponnese, Akadimaikou G. K. Vlachou, 22131 Tripoli, Greece 2 Department of Digital Systems, University of the Peloponnese, Valioti’s Building, Kladas, 23100 Sparta, Greece 3 Department of Informatics and Telecommunications, University of the Peloponnese, Akadimaikou G. K. Vlachou, 22131 Tripoli, Greece * Correspondence: margaris@uop.gr Abstract: One of the typical goals of collaborative filtering algorithms is to produce rating predictions with values very close to what real users would give to an item. Afterward, the items having the largest rating prediction values will be recommended to the users by the recommender system. Collaborative filtering algorithms can be applied to both sparse and dense datasets, and each of these dataset categories involves different kinds of risks. As far as the dense collaborative filtering datasets are concerned, where the rating prediction coverage is, most of the time, very high, we usually face large rating prediction times, issues concerning the selection of a user’s near neighbours, etc. Although collaborative filtering algorithms usually achieve better results when applied to dense datasets, there is still room for improvement, since in many cases, the rating prediction error is relatively high, which leads to unsuccessful recommendations and hence to recommender system unreliability. In this work, we explore rating prediction accuracy features, although in a broader context, in dense collaborative filtering datasets. We conduct an extensive evaluation, using dense datasets, widely used in collaborative filtering research, in order to find the associations between these features and the rating prediction accuracy. Keywords: collaborative filtering; recommender systems; personalisation; dataset density; rating prediction accuracy; accuracy features; evaluation 1. Introduction One of the most widely applied recommender system (RS) methods, over the last 20 years, is collaborative filtering (CF) [1,2]. The typical goal of a CF algorithm is to produce rating predictions for products or services that users have not already evaluated. The closer these rating predictions are to the rating values that the users themselves would give to these products or services, the higher accuracy the CF algorithm will have. Afterwards, based on the aforementioned rating predictions, a CF RS will typically recommend, to each user, the products or services scoring higher rating prediction values. These products carry the highest probability, among all products or services, that the user will actually like them and hence accept the recommendation (by clicking the product advertisement, buying the product or service, etc.) [3,4]. The first step of a typical CF system is to locate the ‘near neighbours’ (NNs) for each of its users. An NN of user u is another user v who shares similar likings with u. This can be found by taking the stored real ratings of users u and v, set_of_ratings u , and set_of_ratings v , finding the ones given to common products or services i (i.e., the intersection of the two sets), and comparing them. If the majority of them are (to a large extent) similar, then these users are NNs with each other [5,6]. Typically, in modern CF RSs, the aforementioned task is implemented using a user vicinity metric, such as the Pearson correlation coefficient Information 2022, 13, 428. https://doi.org/10.3390/info13090428 https://www.mdpi.com/journal/information