Data Poisoning Atacks against Diferentially Private
Recommender Systems
Soumya Wadhwa
soumya.wadhwa@walmartlabs.com
Walmart Labs
Saurabh Agrawal
sagrawal@walmartlabs.com
Walmart Labs
Harsh Chaudhari
∗
chaudharim@iisc.ac.in
Indian Institute of Science
Deepthi Sharma
deepthi.sharma@walmartlabs.com
Walmart Labs
Kannan Achan
kachan@walmartlabs.com
Walmart Labs
ABSTRACT
Recommender systems based on collaborative fltering are highly
vulnerable to data poisoning attacks, where a determined attacker
injects fake users with false user-item feedback, with an objective
to either corrupt the recommender system or promote/demote a
target set of items. Recently, diferential privacy was explored as
a defense technique against data poisoning attacks in the typical
machine learning setting. In this paper, we study the efectiveness
of diferential privacy against such attacks on matrix factorization
based collaborative fltering systems. Concretely, we conduct exten-
sive experiments for evaluating robustness to injection of malicious
user profles by simulating common types of shilling attacks on
real-world data and comparing the predictions of typical matrix
factorization with diferentially private matrix factorization.
KEYWORDS
Data Poisoning, Shilling Attacks, Diferential Privacy, Matrix Fac-
torization, Collaborative Filtering, Recommender Systems
ACM Reference Format:
Soumya Wadhwa, Saurabh Agrawal, Harsh Chaudhari, Deepthi Sharma,
and Kannan Achan. 2020. Data Poisoning Attacks against Diferentially
Private Recommender Systems. In Proceedings of the 43rd International ACM
SIGIR Conference on Research and Development in Information Retrieval
(SIGIR ’20), July 25ś30, 2020, Virtual Event, China. ACM, New York, NY, USA,
4 pages. https://doi.org/10.1145/3397271.3401301
1 INTRODUCTION
Collaborative Filtering (CF) recommender systems have been shown
to be prone to data poisoning in which fake users along with their
feedback are injected into the system [4]. The attacker can construct
the preferences of these fake users so as to fool the recommender
system into behaving in a way desired by the attacker. The attacker
may have an objective to promote a certain set of items, or may try
to compromise the overall quality of the recommendations. While
∗
Work was done as an intern at Walmart Labs, Bangalore.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
SIGIR ’20, July 25ś30, 2020, Virtual Event, China
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-8016-4/20/07. . . $15.00
https://doi.org/10.1145/3397271.3401301
such attacks are possible on all kinds collaborative fltering systems,
we focus on matrix factorization based CF in this paper.
[8] studied defense against data poisoning attacks, focusing on
classifcation algorithms. However, their technique of outlier re-
moval doesn’t naturally lend itself to the matrix factorization setting.
Recently, [6] proposed diferential privacy as a defense technique
against data poisoning attacks on machine learning systems, while
formally defning the attacker’s cost and proving bounds on the
minimization of this cost against the proposed defense mechanism.
We extend this work to our CF setting. Concretely,
• We defne attacker utility for a given data poisoning attack
objective, and derive a fnite upper bound on this utility for
a diferentially private matrix factorization algorithm.
• We simulate diferent types of shilling attacks (for promot-
ing specifc movies) on a real world dataset (MovieLens) to
compare the diference in impact of data poisoning between
typical matrix factorization and diferentially private matrix
factorization (DPMF). We observe empirically that DPMF
is more robust to such attacks and leads to lower values of
attack utility up to a reasonable level of injection.
2 BACKGROUND
2.1 Diferential Privacy
Given data space Z, let M be a randomized learner and D =
∞
=0
Z
be the space of all training data with ∈D being a particular data
set. We defne Diferential Privacy [1] as follows:
Defnition 2.1. (Diferential Privacy) We call a randomized learner,
M, ( , ) -diferentially private if ∀ ,
′
∈D that difer by one item
and for all measurable sets S⊂ (M)
P(M( ) ∈ S) ≤
P(M(
′
) ∈ S) +
If = 0, we call M -diferentially private. Informally, the above
defnition states that if any one point in the database is modifed,
the output of the randomized learner will not change by much. In
the above equation, is positive, and the smaller the value of , the
stronger is the privacy guarantee.
2.2 Collaborative Filtering
We assume the standard setting of collaborative fltering, where
users rate a subset of items. We denote the full rating matrix by
R = [
]
×
and R⊂[]×[] as the
entries in R where user
has rated item ("seen" or "observed" ratings). Our goal is to predict
Short Research Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China
1617