Variational Autoencoders for Top-K Recommendation with
Implicit Feedback
Bahare Askari
Ontario Tech University
Canada
bahare.askarifroozjayi@ontariotechu.ca
Jaroslaw Szlichta
Ontario Tech University
Canada
jarek@ontariotechu.ca
Amirali Salehi-Abari
Ontario Tech University
Canada
abari@ontariotechu.ca
ABSTRACT
Variational Autoencoders (VAEs) have shown to be efective for
recommender systems with implicit feedback (e.g., browsing history,
purchasing patterns, etc.). However, a little attention is given to
ensembles of VAEs, that can learn user and item representations
jointly. We introduce Joint Variational Autoencoder (JoVA), an
ensemble of two VAEs, which jointly learns both user and item
representations to predict user preferences. This design allows JoVA
to capture user-user and item-item correlations simultaneously. We
also introduce JoVA-Hinge, a JoVA’s extension with a hinge-based
pairwise loss function, to further specialize it in recommendation
with implicit feedback. Our extensive experiments on four real-
world datasets demonstrate that JoVA-Hinge outperforms a broad
set of state-of-the-art methods under a variety of commonly-used
metrics. Our empirical results also illustrate the efectiveness of
JoVA-Hinge for handling users with limited training data.
CCS CONCEPTS
· Information systems → Collaborative fltering; Learning
to rank.
KEYWORDS
Recommender Systems, Deep Learning, Variational Autoencoders
ACM Reference Format:
Bahare Askari, Jaroslaw Szlichta, and Amirali Salehi-Abari. 2021. Variational
Autoencoders for Top-K Recommendation with Implicit Feedback. In Pro-
ceedings of the 44th International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR ’21), July 11ś15, 2021, Virtual
Event, Canada. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/
3404835.3462986
1 INTRODUCTION
The information overload and abundance of choices on the Web
have made recommendation systems indispensable in facilitating
user decision-making. Recommender systems provide personalized
user experience by fltering relevant items (e.g., books, music, or
movies) or information (e.g., news). Many eforts have been devoted
to developing efective recommender systems [1, 19].
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
SIGIR ’21, July 11ś15, 2021, Virtual Event, Canada
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8037-9/21/07. . . $15.00
https://doi.org/10.1145/3404835.3462986
Collaborative fltering (CF)Ða well-recognized approach in rec-
ommender systemsÐis based on the idea that users with similar
revealed preferences are likely to have similar preferences in the
future [19]. User preferences in CF techniques are in the form of
either explicit feedback (e.g., ratings, reviews, etc.) or implicit feed-
back (e.g., browsing history, purchasing history, search patterns,
etc.). While explicit feedback is more informative than its implicit
alternative, it imposes more cognitive burden on users through
elicitation, is subject to noisy self-reporting [2], and sufers from
interpersonal comparison or calibration issues [3]. In contrast, im-
plicit feedback naturally originates from user behavior when an
interaction with an item is a signal of interest in the item.
The implicit feedback has made collaborative fltering more
intriguing at the cost of some practical challenges. The implicit
feedback lacks negative examples, as the absence of a user-item
interaction is not necessarily indicative of user disinterest (e.g., the
user is unaware of the item). Also, the user-item interaction data
for implicit feedback is large, yet sparse. It is even more sparse than
explicit feedback data, since the unobserved user-item interactions
are a mixture of both missing values and real negative feedback.
Many attempts have been made to address these challenges by
deep learning [24]. Multilayer perceptron networks were arguably
the frst class of neural networks successfully applied for collabo-
rative fltering [6, 9]. Recent interest is in deploying the variants
of autoencoders, such as classical [25], denoising [21], and vari-
ational [14, 15]. However, these solutions either do not capture
uncertainty of the latent representations [21, 25], or solely focus
on latent representation of users [14, 15].
We present the joint variational autoencoder (JoVA) model, an en-
semble of two variational autoencoders (VAEs), that jointly learns
both user and item representations under uncertainty, and then
collectively predicts user preferences. This design enables JoVA to
encapsulate user-user and item-item correlations simultaneously.
We also introduce JoVA-Hinge, a variant of JoVA, which extends the
JoVA’s objective function with a pairwise ranking loss, to addition-
ally specialize it for top-k recommendation with implicit feedback.
Through extensive experiments over four real-world datasets, we
show the accuracy improvements of our proposed solutions over a
variety of state-of-the-art methods. Our JoVA-Hinge signifcantly
outperforms other methods in the sparse datasets (up to 34% accu-
racy improvement). Our extensive ablation study on JoVA-Hinge
confrms that its success originates from all of its integral compo-
nents (i.e., ensemble of VAEs and hinge loss).
2 RECOMMENDATION AND IMPLICIT DATA
We assume that a set of users can interact with the set of
items (e.g., users click ads, purchase products, watch movies, or
Short Research Paper II SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
2061