Context-Dependent Items Generation in Collaborative Filtering Linas Baltrunas Free University of Bozen-Bolzano, Piazza Università 1, Bolzano, Italy lbaltrunas@unibz.it Francesco Ricci Free University of Bozen-Bolzano, Piazza Università 1, Bolzano, Italy fricci@unibz.it ABSTRACT Collaborative Filtering (CF) exploits users’ recorded ratings for predicting ratings on items not evaluated yet. In classi- cal CF each item is modelled by a set of users’ ratings not specifying in which contextual conditions the ratings were obtained (e.g., the time when the item was rated or the goal of the consumption). In some domains the context could heavily influence the rating values. Therefore, a single rating for each user and item combination could be insufficient for making accurate predictions. This paper introduces and an- alyzes a technique, item splitting, for dealing with context by generating new items. In this approach, the ratings’ vectors of some items are split in two vectors containing the ratings collected in two alternative contextual conditions. Hence, each split generates two fictitious items that are used in the prediction algorithm instead of the original one. We eval- uated this approach on real world and semi-synthetic data sets using matrix-factorization and nearest neighbor CF al- gorithms. We also compared our approach to the classical reduction based context-aware CF approach. We show that item splitting can be beneficial and its performance depends on the splitting criteria and on the influence of the contex- tual variables on the item ratings. Moreover, we show that item splitting can perform better than the reduction based approach. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Information filtering General Terms Algorithms, Experimentation 1. INTRODUCTION Collaborative Filtering (CF) recommendations are computed by leveraging historical log data of users’ online behavior [2]. CF assumes that the user’s recorded ratings for items can help in predicting the ratings of like-minded users. This CARS-2009, October 25, 2009, New York, NY, USA. Copyright is held by the author/owner(s). assumption is valid only to some extent. In fact, the user’s general interests can be relatively stable, but, the exact eval- uation of an item can be influenced by many additional and varying factors. In certain domains the consumption of the same item can lead to extremely different experiences when the context changes [1, 4]. For instance, in a tourism appli- cation the visiting experience to a beach in summer is strik- ingly different from the same visit in winter (e.g., during a conference meeting). However, most CF recommender sys- tems would not distinguish between these two experiences, thus providing a poor recommendation in certain situations. Context-aware recommender systems is a new area of re- search [1], and context-aware approaches can be classified into three groups: pre-filtering, post-filtering and contex- tual modelling [3]. Reduction based approach [1] extended the classical CF method adding to the standard dimensions of users and items new ones representing contextual infor- mation. Here recommendations are computed using only the ratings made in the same context as the target one. The authors use a hierarchical representation of context, there- fore, the exact granularity of the used context is searched (optimized) among those that improve the accuracy of the prediction. Similarly, in our approach we enrich the simple 2-dim. CF matrix with a model of the context comprising a set of features either of the user, or the item, or the evalua- tion. We adopt the definition of context introduced by Dey, where “Context is any information that can be used to char- acterize the situation of an entity”[7]. Here, the entity is the experience of an item that can be influenced by contextual variables describing the state of the user and the item. In this paper we propose a new approach for using these con- textual dimensions to pre-filter the target item ratings (the item whose rating prediction is sought). Actually, to be pre- cise, the set of ratings for an item is not filtered but it is split into two subsets according to the value of a contextual vari- able, e.g., ratings collected in “winter” or in “summer” (the contextual variable is the season of the rating/evaluation). These two sets of ratings are then assigned to two new ficti- tious items (e.g. beach in winter and in summer). This split is performed only if there is statistical evidence that under these two contextual conditions the item’s ratings were dif- ferent, i.e., users evaluate the item differently. This study also shows that standard neighborhood and ma- trix factorization based CF models cannot cope with rating data influenced by contextual conditions. In fact, we show that if the contextual condition does influence the item rat-