Statistical Attack Detection Neil Hurley School of Computer Science and Informatics, University College Dublin, Ireland neil.hurley@ucd.ie Zunping Cheng School of Computer Science and Informatics, University College Dublin, Ireland zunping.cheng@ucd.ie Mi Zhang School of Computer Science and Informatics, University College Dublin, Ireland mi.zhang@ucd.ie ABSTRACT It has been shown in recent years that eﬀective proﬁle injec- tion or shilling attacks can be mounted on standard recom- mendation algorithms. These attacks consist of the insertion of bogus user proﬁles into the system database in order to manipulate the recommendation output, for example to pro- mote or demote the predicted ratings for a particular prod- uct. A number of attack models have been proposed and some detection strategies to identify these attacks have been empirically evaluated. In this paper we show that the stan- dard attack models can be readily detected using statistical detection techniques. We argue that insuﬃcient considera- tion of the eﬀectiveness of attacks under a constraint of sta- tistical invariance has been taken in past research. In fact, it is possible to create eﬀective attacks that are undetectable using the detection strategies proposed to date, including the PCA-based clustering strategy which has shown excel- lent performance against standard attacks. Nevertheless, these more advanced attacks can also be detected with care- ful design of a statistical detector. The question posed for future research is whether attack models that produce eﬀec- tive attack proﬁles that are statistically identical to genuine proﬁles are really possible. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—collaborative ﬁltering, robustness ; G.3 [Probability and Statistics]: Robust regression General Terms Performance, Security 1. INTRODUCTION The possibility of designing user rating proﬁles to deliber- ately and maliciously manipulate the recommendation out- put of a collaborative ﬁltering system was ﬁrst raised in [14]. One scenario proposed was that an author, motivated Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. RecSys’09, October 23–25, 2009, New York, New York,USA. Copyright 2009 ACM 978-1-60558-435-5/09/10 ...$10.00. to increase recommendations of his book, might create a set of false proﬁles that rate the book highly, in an eﬀort to artiﬁcially promote the ratings given by the system to genuine users. Since then, these attacks have been dubbed as shilling attacks [8] or proﬁle injection attacks [1]. Sev- eral attack models have been proposed and the performance of these attacks in terms of inﬂuencing the system predic- tions has been evaluated for a number of memory-based and model-based collaborative ﬁltering algorithms. It is gener- ally accepted that the best ‘general purpose’ attack strategy is the so-called ‘Average attack’ proposed initially in [8], al- though it requires some knowledge of the rating statistics of genuine users. To counteract attacks, researchers have in- vestigated the application of classiﬁcation techniques to user proﬁles, in order to identify attack proﬁles and ﬁlter them from the dataset. Several attack detection strategies have been proposed. Of these, the principal components anal- ysis (PCA) based detection strategy, recently proposed in [10], yields the best detection performance, obtaining over 90% precision in the detection of Average and other attack models when evaluated on the Movielens dataset. In this paper, we review attack models in the context of their detectability. We propose to use Neyman-Pearson sta- tistical detection to identify attack proﬁles and show how to statistically model the standard attacks that have been proposed to date. Our analysis shows that the success of the PCA-based detector is largely due to the unrealistic man- ner in which items are rated in attack proﬁles of standard attacks. With this realisation, we show how it is possi- ble to create an eﬀective attack which is undetectable by the PCA detector with a simple modiﬁcation of the Aver- age attack. To address the detection of such obfuscated at- tacks, we model an attacked dataset as a multivariate Gaus- sian mixture model and design supervised and unsupervised Neyman-Pearson detectors based on this model. These de- tectors signiﬁcantly out-perform the PCA detector on ob- fuscated attacks. The signiﬁcance of this analysis is as follows. Firstly, it demonstrates that the attack models proposed to date are in- suﬃcient and have failed to properly address the question of designing undetectable attacks. Even simple modiﬁcations of the existing models are suﬃcient to fool the best detectors proposed to date. Secondly, we have shown that Neyman- Pearson statistical detection is a powerful tool that can be successfully applied to the detection of a wider range of ob- fuscated attacks. In the end, we argue that it is important to consider robustness of recommender systems, taking into ac- count the level of knowledge available to the attacker, both