Variational Autoencoders for Top-K Recommendation with Implicit Feedback Bahare Askari Ontario Tech University Canada bahare.askarifroozjayi@ontariotechu.ca Jaroslaw Szlichta Ontario Tech University Canada jarek@ontariotechu.ca Amirali Salehi-Abari Ontario Tech University Canada abari@ontariotechu.ca ABSTRACT Variational Autoencoders (VAEs) have shown to be efective for recommender systems with implicit feedback (e.g., browsing history, purchasing patterns, etc.). However, a little attention is given to ensembles of VAEs, that can learn user and item representations jointly. We introduce Joint Variational Autoencoder (JoVA), an ensemble of two VAEs, which jointly learns both user and item representations to predict user preferences. This design allows JoVA to capture user-user and item-item correlations simultaneously. We also introduce JoVA-Hinge, a JoVA’s extension with a hinge-based pairwise loss function, to further specialize it in recommendation with implicit feedback. Our extensive experiments on four real- world datasets demonstrate that JoVA-Hinge outperforms a broad set of state-of-the-art methods under a variety of commonly-used metrics. Our empirical results also illustrate the efectiveness of JoVA-Hinge for handling users with limited training data. CCS CONCEPTS · Information systems → Collaborative fltering; Learning to rank. KEYWORDS Recommender Systems, Deep Learning, Variational Autoencoders ACM Reference Format: Bahare Askari, Jaroslaw Szlichta, and Amirali Salehi-Abari. 2021. Variational Autoencoders for Top-K Recommendation with Implicit Feedback. In Pro- ceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21), July 11ś15, 2021, Virtual Event, Canada. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/ 3404835.3462986 1 INTRODUCTION The information overload and abundance of choices on the Web have made recommendation systems indispensable in facilitating user decision-making. Recommender systems provide personalized user experience by fltering relevant items (e.g., books, music, or movies) or information (e.g., news). Many eforts have been devoted to developing efective recommender systems [1, 19]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. SIGIR ’21, July 11ś15, 2021, Virtual Event, Canada © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8037-9/21/07. . . $15.00 https://doi.org/10.1145/3404835.3462986 Collaborative fltering (CF)Ða well-recognized approach in rec- ommender systemsÐis based on the idea that users with similar revealed preferences are likely to have similar preferences in the future [19]. User preferences in CF techniques are in the form of either explicit feedback (e.g., ratings, reviews, etc.) or implicit feed- back (e.g., browsing history, purchasing history, search patterns, etc.). While explicit feedback is more informative than its implicit alternative, it imposes more cognitive burden on users through elicitation, is subject to noisy self-reporting [2], and sufers from interpersonal comparison or calibration issues [3]. In contrast, im- plicit feedback naturally originates from user behavior when an interaction with an item is a signal of interest in the item. The implicit feedback has made collaborative fltering more intriguing at the cost of some practical challenges. The implicit feedback lacks negative examples, as the absence of a user-item interaction is not necessarily indicative of user disinterest (e.g., the user is unaware of the item). Also, the user-item interaction data for implicit feedback is large, yet sparse. It is even more sparse than explicit feedback data, since the unobserved user-item interactions are a mixture of both missing values and real negative feedback. Many attempts have been made to address these challenges by deep learning [24]. Multilayer perceptron networks were arguably the frst class of neural networks successfully applied for collabo- rative fltering [6, 9]. Recent interest is in deploying the variants of autoencoders, such as classical [25], denoising [21], and vari- ational [14, 15]. However, these solutions either do not capture uncertainty of the latent representations [21, 25], or solely focus on latent representation of users [14, 15]. We present the joint variational autoencoder (JoVA) model, an en- semble of two variational autoencoders (VAEs), that jointly learns both user and item representations under uncertainty, and then collectively predicts user preferences. This design enables JoVA to encapsulate user-user and item-item correlations simultaneously. We also introduce JoVA-Hinge, a variant of JoVA, which extends the JoVA’s objective function with a pairwise ranking loss, to addition- ally specialize it for top-k recommendation with implicit feedback. Through extensive experiments over four real-world datasets, we show the accuracy improvements of our proposed solutions over a variety of state-of-the-art methods. Our JoVA-Hinge signifcantly outperforms other methods in the sparse datasets (up to 34% accu- racy improvement). Our extensive ablation study on JoVA-Hinge confrms that its success originates from all of its integral compo- nents (i.e., ensemble of VAEs and hinge loss). 2 RECOMMENDATION AND IMPLICIT DATA We assume that a set of  users  can interact with the set of  items  (e.g., users click ads, purchase products, watch movies, or Short Research Paper II SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 2061