Crawling Attacks Against Web-based Recommender Systems Runa Bhaumik, Robin Burke, Bamshad Mobasher Center for Web Intelligence School of Computer Science, Telecommunication and Information Systems DePaul University, Chicago, Illinois (rbhaumik,rburke,mobasher)@cs.depaul.edu Abstract—User profiles derived from Web navigation data are used in important e-commerce applications such as Web personalization, recommender systems, and Web analytics. In the open environment of the Internet, malicious third parties may seek to manipulate the output of such applications (such as the suggestions produced by a recommender system) by manipulating the input, through the generation of false user navigation profiles. Recent research has shown that systems using explicit ratings input by users are highly vulnerable to such “profile injection” attacks. Malicious users can cause certain products to be recommended more frequently and others less frequently. We show that Web recommenders that use implicit Web navigation profiles to learn user preference models, despite using different algorithms than traditional rec- ommenders based on explicit ratings, are nevertheless subject to similar manipulation. We examine the impact of “crawling attacks” against navigation-based Web recommender systems. A crawling attack is comprised of a set of user profiles that a rogue agent may inject into the clickstream navigation data by crawling the site in a way that would change the future behavior of the system. We examine different attack types and show that they are effective against the most common personalization algorithms based on Web usage mining. I. I NTRODUCTION Due to the explosive growth of the Web, Web person- alization and recommender systems have gained popularity, helping people find the information they want or find inter- esting, and allowing Web site owners to optimize their sites and increase user satisfaction. These systems dynamically generate pages, products, and recommendations for the users based on their profiles, interests or preferences. In most e-commerce recommender systems, the user pro- vides some input and the system processes that information to generate a list of recommendations. The input, indicating the user’s preferences on items can be explicit ratings of items or derived based on the implicit indications of interest such as the user’s behavior during navigation. An increasing number of Web recommender systems today make use of implicit ratings where Web user’s interests are captured dur- ing the interaction with the Web site by navigating through a sequence of pages. Web personalization systems identify the user’s interest in individual or groups of items, based on some measures such as whether an item is purchased or not, time spent viewing a page or item, etc. For example, Amazon.com monitors each customer’s activity and purchase behavior and uses this information to build the user profile. A set of items viewed by a customer during a particular session or placed in the shopping cart is used to generate a list of recommended items for the user. Most of these systems require the modeling and analysis of users’ navigational behavior from click-stream data collected by Web servers and stored in access logs, a process that is commonly referred to as Web usage mining [5], [16], [13]. The goal of personalization based on Web usage mining is to recommend a set of objects to the current (active) user, possibly consisting of links, ads, text, products, or services, tailored to the user’s preferences. This task is accomplished by matching the active user session (possibly in conjunction with previously stored profiles for that user) with the usage patterns discovered through Web usage mining. A number of recent surveys provide detailed discussions of a variety of data mining techniques that can be used for Web personal- ization [9], [10]. Prior research has shown that the behavior of the most- commonly used recommendation algorithms which use ex- plicit user feedback can be manipulated by fairly small-scale attacks that do not require a great deal of knowledge about the details of the recommender system or its algorithms [14], [8], [2], [11]. A recommender system which uses explicit rat- ings, requires users to create some sort of account. However, a determined attacker may be able to outwit schemes de- signed to prevent automated account registration. Definitely, the cost of generating new profile is significant. On the other hand, navigation-based recommender systems, use implicit feedback captured in the clickstream data and are mainly dependent on the navigation profiles of anonymous users who visit pages in a particular order or combination. Web recommendation is typically performed on log data, which is not associated with user accounts. Thus, an attacker needs only successfully disguise his automated site crawler as a large number of different legitimate Web clients, something easily achieved through anonymized browsing techniques. An attacker, who could inject such navigation profiles by visiting a combination of items often enough, may produce any recommendation behavior for future users that he or she desires. For example, Amazon.com and many other Web sites generate a common form of navigation oriented recommendation: when a user is viewing a particular item A, the system may recommend items B, C, D that other users