Tag-Based Filtering for Personalized Bookmark Recommendations Pavan K. Vatturi * , Werner Geyer, Casey Dugan, Michael Muller, Beth Brownholtz ABSTRACT This paper investigates using social tags for the purpose of making personalized content recommendations. Our tag-based recommender creates a personalized bookmark recommendation model for each user based on “current” and “general interest” tags, defined by different time intervals. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Information filtering. General Terms Algorithms, Design, Experimentation. Keywords Recommendation, personalization, social tags, bookmarks. 1. INTRODUCTION The goal of this paper is to study the utility of social tags as a new way to recommend personalized content to users. The basic assumption is simple: When users share content and associate tags with that content, it is likely they would be interested in additional content described by similar tags. As such, these tags can be seen as a fraction of a keyword-based user-interest profile. The more a user tags, the more complete the profile gets, and the more effectively it can be used for recommendation purposes. The advantage of using social tags is that they do not require users to create or update their profiles, or to provide explicit feedback [1]. The tag set can be used as a filter for incoming information and can be automatically updated over time by adding users’ recent tags and removing older ones. Hayes et al. [2] used tags for non- personalized blog recommendations. In our work, we focus on using tags from an enterprise social bookmarking system [3] to create personalized bookmark recommendations. 2. TAG-BASED RECOMMENDER We create a personalized tag-based recommender for each user as described in Figure 1. Our recommender consists of two Naïve Bayes classifiers trained over different timeframes: One classifier predicts the user’s current interest; the other classifier predicts the user’s general interest in a bookmark. We aggregate both predictions to a final prediction in the following way: If either or both of the two classifiers predict a bookmark as interesting, we recommend the bookmark. If neither classifier predicts the bookmark as interesting, we do not recommend it. The two classifiers are trained with a subset of the bookmarks created by a user. The tags of each bookmark, converted into a “bag of words”, are used as training features. The core idea is to consider recent bookmarks as good implicit user interest indicators. Previous research has shown that implicit indicators like bookmarks created by the user while browsing the web can be as predictive of interest levels as explicit ratings [1]. For both classifiers, more recent bookmarks are treated as positive training samples, i.e. interesting to the user, whereas older bookmarks are treated as negative training examples i.e. less interesting to the user. The general interest classifier uses bookmarks from a longer time interval as training samples in order to capture general interest topics. The current interest classifier is trained based on a shorter time interval in order to reflect current interests. Current Interest Classifier (CIC) General Interest Classifier (GIC) Combined Prediction 1 1 0 1 1 1 1 0 1 0 0 0 Combined Result CIC GIC 1 1 0 1 1 1 1 0 1 0 0 0 Combined Result CIC GIC Bookmark tags tags 0 / 1 0 / 1 0 = not recommended 1 = recommended Figure 1. Tag-based Bookmark Recommender. *School of EECS, Oregon State University 1148 Kelly Engineering Center Corvallis, OR 97331, USA +1 (541) 737 3617 vatturi@eecs.orst.edu IBM T.J. Watson Research One Rogers Street Cambridge, MA 02116, USA +1 (617) 693 4791 {werner.geyer, cadugan, michael_muller, beth_brownholtz}@us.ibm.com Copyright is held by the author/owner(s). CIKM’08, October 26–30, 2008, Napa Valley, California, USA. ACM 978-1-59593-991-3/08/10. 1395