On Bootstrapping Recommender Systems Nadav Golbandi Yahoo! Labs, Haifa, Israel nadavg@yahoo-inc.com Yehuda Koren Yahoo! Labs, Haifa, Israel yehuda@yahoo-inc.com Ronny Lempel Yahoo! Labs, Haifa, Israel rlempel@yahoo-inc.com ABSTRACT Recommender systems perform much better on users for which they have more information. This gives rise to a problem of sat- isfying users new to a system. The problem is even more acute considering that some of these hard to profile new users judge the unfamiliar system by its ability to immediately provide them with satisfying recommendations, and may be the quickest to abandon the system when disappointed. Rapid profiling of new users is of- ten achieved through a bootstrapping process - a kind of an initial interview - that elicits users to provide their opinions on certain carefully chosen items or categories. This work offers a new boot- strapping method, which is based on a concrete optimization goal, thereby handily outperforming known approaches in our tests. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications—Data Min- ing General Terms Algorithms Keywords collaborative filtering, new user, recommender systems, user cold start 1. INTRODUCTION Modern consumers are inundated with choices. Electronic re- tailers and content providers offer huge selections of products, with unprecedented opportunities to meet a variety of special needs and tastes. Matching consumers with the most appropriate products is not trivial, yet is key to enhancing user satisfaction and loyalty. This motivates the study of recommender systems, which analyze patterns of user interest in items or products to provide personalized recommendations of items that will suit a user’s taste. One particular challenge that recommender systems face is han- dling new users; this is known as the user cold start problem. The Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM 978-1-60558-495-9/09/06 ...$10.00. quality of recommendations strongly depends on the amount of data gathered from the user, making it difficult to generate reason- able recommendations to users new to the system. In order to quan- tify this point, Fig. 1 shows how the error on Netflix test data de- creases as users provide more ratings. Users that have vested many ratings in the system, can enjoy error rates around 0.85, whereas new users, with just a few known ratings, are served with a sig- nificantly higher error rate (around 1). Yet, new users are crucial for the recommendation environment, and providing them with a good experience is essential to growing the user base of the sys- tem. Pleasing these new users is all the more challenging, as they often judge the system’s value based on their first few experiences. This is particularly true for systems based on explicit user feedback, where users are required to actively provide ratings to the system in order to get useful suggestions. The essence of bootstrapping a recommender system is to promote this interaction, encouraging users to invest in a low-effort initial interaction that will lead them to an immediately rewarding experience. 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 20 40 60 80 100 120 140 160 180 200 RMSE # ratings per user Figure 1: The test error rate vs. number of train ratings per user on the Netflix data. Lower y-axis values represent more accurate predictions. The x-axis describes the exact number of ratings taken for each user. When the x value equals k, we are considering only users that gave at least k ratings. For each such user, we sort the ratings in chronological order and take the first k ratings into account. Results are computed by the factorized item-item model [5]. This paper introduces a method for eliciting information from new users by asking their feedback on a few deliberately chosen items. The method involves creating a seed set of items based on optimizing a formally defined cost function, thereby handily out- performing previous approaches.