New user profile learning for extremely sparse data sets Tomasz Hoffmann, Tadeusz Janasiewicz, and Andrzej Szwabe Institute of Control and Information Engineering, Poznan University of Technology, pl. Marii Curie-Skladowskiej 5, 60-965 Poznan, Poland {tomasz.hoffmann,tadeusz.janasiewicz,andrzej.szwabe}@put.poznan.pl http://www.put.poznan.pl Abstract. We propose a new method of online user profile learning for recommender systems, that deals effectively with extreme sparsity of behavioral data. The proposed method enhances the singular values rescaling method and uses a pair of vectors to represent both positive and neutral user preferences. A list of discarded elements is used in a sim- ple implementation of negative relevance feedback. We experimentally show the negative impact of dimensionality reduction on the accuracy of recommendations based on extremely sparse data. We introduce a new method for recommendation quality evaluation that involves on the measurement of F1 performed iteratively during a simulated session. The combined use of the singular value rescaling and the user profile repre- sentation based on two complementary vectors has been compared with the use of well-known recommendation methods showing the superiority of our method in the online user profile updating scenario. Keywords: Recommender systems, user profile learning, collaborative data sparsity, vector space model, cold-start problem, relevance feedback 1 Introduction The main purpose of many recommender systems is to recommend items to users in the interactive web environment [6], [7]. Behavioral data sparsity makes the effective online interaction between users and a recommender system an espe- cially challenging task [3]. To our knowledge, there are only few algorithms for new user profile learning that are oriented towards dealing with extremely sparse data sets. As shown in [2], data sparsity is a severe limitation for the effectiveness of methods based on dimensionality reduction [6]. In the classical vector space model a user profile is represented by a vector that aggregates vectors of all items selected by the user [1], [6]. In that case no additional information about unse- lected items is used, i.e., only ’positive’ preferences are stored. Such an approach to user profile modeling has a significant impact on recommendation accuracy. We assume that the purpose of personalized recommendation is to identify topN products that are the most relevant to the user [8]. Following this assump- tion, in this paper we investigate a double vector representation of a user profile,