Data Poisoning Atacks against Diferentially Private Recommender Systems Soumya Wadhwa soumya.wadhwa@walmartlabs.com Walmart Labs Saurabh Agrawal sagrawal@walmartlabs.com Walmart Labs Harsh Chaudhari ∗ chaudharim@iisc.ac.in Indian Institute of Science Deepthi Sharma deepthi.sharma@walmartlabs.com Walmart Labs Kannan Achan kachan@walmartlabs.com Walmart Labs ABSTRACT Recommender systems based on collaborative fltering are highly vulnerable to data poisoning attacks, where a determined attacker injects fake users with false user-item feedback, with an objective to either corrupt the recommender system or promote/demote a target set of items. Recently, diferential privacy was explored as a defense technique against data poisoning attacks in the typical machine learning setting. In this paper, we study the efectiveness of diferential privacy against such attacks on matrix factorization based collaborative fltering systems. Concretely, we conduct exten- sive experiments for evaluating robustness to injection of malicious user profles by simulating common types of shilling attacks on real-world data and comparing the predictions of typical matrix factorization with diferentially private matrix factorization. KEYWORDS Data Poisoning, Shilling Attacks, Diferential Privacy, Matrix Fac- torization, Collaborative Filtering, Recommender Systems ACM Reference Format: Soumya Wadhwa, Saurabh Agrawal, Harsh Chaudhari, Deepthi Sharma, and Kannan Achan. 2020. Data Poisoning Attacks against Diferentially Private Recommender Systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), July 25ś30, 2020, Virtual Event, China. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3397271.3401301 1 INTRODUCTION Collaborative Filtering (CF) recommender systems have been shown to be prone to data poisoning in which fake users along with their feedback are injected into the system [4]. The attacker can construct the preferences of these fake users so as to fool the recommender system into behaving in a way desired by the attacker. The attacker may have an objective to promote a certain set of items, or may try to compromise the overall quality of the recommendations. While ∗ Work was done as an intern at Walmart Labs, Bangalore. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. SIGIR ’20, July 25ś30, 2020, Virtual Event, China © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-8016-4/20/07. . . $15.00 https://doi.org/10.1145/3397271.3401301 such attacks are possible on all kinds collaborative fltering systems, we focus on matrix factorization based CF in this paper. [8] studied defense against data poisoning attacks, focusing on classifcation algorithms. However, their technique of outlier re- moval doesn’t naturally lend itself to the matrix factorization setting. Recently, [6] proposed diferential privacy as a defense technique against data poisoning attacks on machine learning systems, while formally defning the attacker’s cost and proving bounds on the minimization of this cost against the proposed defense mechanism. We extend this work to our CF setting. Concretely, • We defne attacker utility for a given data poisoning attack objective, and derive a fnite upper bound on this utility for a diferentially private matrix factorization algorithm. • We simulate diferent types of shilling attacks (for promot- ing specifc movies) on a real world dataset (MovieLens) to compare the diference in impact of data poisoning between typical matrix factorization and diferentially private matrix factorization (DPMF). We observe empirically that DPMF is more robust to such attacks and leads to lower values of attack utility up to a reasonable level of injection. 2 BACKGROUND 2.1 Diferential Privacy Given data space Z, let M be a randomized learner and D = ∞   =0 Z  be the space of all training data with  ∈D being a particular data set. We defne Diferential Privacy [1] as follows: Defnition 2.1. (Diferential Privacy) We call a randomized learner, M, ( ,  ) -diferentially private if ∀ ,  ′ ∈D that difer by one item and for all measurable sets S⊂  (M) P(M( ) ∈ S) ≤   P(M( ′ ) ∈ S) +  If  = 0, we call M  -diferentially private. Informally, the above defnition states that if any one point in the database is modifed, the output of the randomized learner will not change by much. In the above equation,  is positive, and the smaller the value of  , the stronger is the privacy guarantee. 2.2 Collaborative Filtering We assume the standard setting of collaborative fltering, where  users rate a subset of  items. We denote the full rating matrix by R = [  ] × and R⊂[]×[] as the   entries in R where user  has rated item  ("seen" or "observed" ratings). Our goal is to predict Short Research Papers I SIGIR ’20, July 25–30, 2020, Virtual Event, China 1617