1 Extended Latent Class Models for Collaborative Recommendation Kwok-Wai Cheung, Kwok-Ching Tsui and Jiming Liu Abstract— With the advent of the WWW, providing just- in-time personalized product recommendations to customers becomes possible. Collaborative recommender systems utilize the correlation between customer preference ratings to identify “like-minded” customers and predict their product preference. One factor determining the success of the recommender systems is the prediction accuracy, which in many cases is limited by lacking adequate ratings (the sparsity problem). Recently, the use of latent class model (LCM) has been proposed to alleviate this problem. In this paper, we first study how the LCM can be extended to handle customers and products outside the training set. In addition, we propose the use of a pair of LCMs (called dual latent class model – DLCM), instead of a single LCM, to model customers’ likes and dislikes separately so as to enhance the prediction accuracy. Experimental results based on the EachMovie dataset show that DLCM outperforms both LCM and the conventional correlation-based method when the available ratings are sparse. Index Terms— Collaborative fitering, recommender systems, personalization, latent class models I. I NTRODUCTION Product recommendation is one of the most important business activities for attracting customers. With the advent of the World Wide Web, on-line companies can now recommend products to their customers on a one-to-one basis in real time, and more importantly, at a much lower cost. Different recommender systems have been proposed in the literature [1], [2] and related products/services have also been released in the market (e.g., Andromedia.com, Netperception.com). Based on the underlying technology, recommender systems can be broadly categorized as content-based or collaborative. Content-based recommender systems match customer inter- est profiles (e.g., revealed by their highly rated products) with the product attributes (or features) when making recommen- dations. Different machine learning [3], [4] and information retrieval [5], [6] algorithms have been proposed for profile representation and ratings prediction. One successful applica- tion of the content-based approach is personalized Web pages recommendation (e.g., Letizia [7]). In order for the approach to be effective, sufficiently rich and accurate product information as well as personal profiles should be available. Besides, the product attributes have to be carefully chosen for the product and profile Bad choices of features result in recommender systems with either low discriminating power (the shallow- analysis problem) or bias in reflecting the customer interest (the over-specialization problem) [8]. Authors are with Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong. E-mail: william, tsuikc, jiming @comp.hkbu.edu.hk Collaborative recommender systems are based on the sim- ilarity between customer preference ratings for computing recommendations. As the approach does not rely on prod- uct contents, it does not possess the two problems of the content-based approach and thus has widely been used for recommending products where product descriptions are either lacking or found to be too specific to be useful. Many different techniques have been proposed for collaborative recommen- dation, including the most original correlation-based methods [9], [10], latent semantic indexing (LSI) [11], [12], Bayesian learning [13], [14], etc. Successful application domains include recommendation of Usenet articles [9], musics [10], etc. In order for collaborative recommendation to be accurate, a large enough number of customers willing to provide preference ratings for the products are required, and the product coverage of their ratings should have significant overlaps. However, this may not be the case in reality because of either lacking such a large customer pool or new products being encountered (the sparsity problem). Applying simple clustering or some statistical cluster models to the preference ratings has been demonstrated to be able to improve the local density of the ratings and is considered to be a promising remedy for the sparsity problem [15], [16]. In this paper, we first describe a statistical cluster model — the latent class model (LCM), originally proposed by Hofmann et al. for collaborative filtering [15], and study how a properly trained LCM can also be used to handle customers and products outside the training set for recommendation. Also, we argue that the LCM is limited in terms of correctly modeling like and dislike ratings and propose a dual latent class model (DLCM) which is trained using two sets of data converted from the original ratings, one with ratings for liked items and another with those for disliked ones. This modification allows the groupings of customers with similar likes and dislikes to be captured and thus improve the overall predictive power of the model. Experiments based on the EachMovie dataset were conducted for performance evaluation. It was found that DLCM outperforms LCM and a conventional correlation- based method when the ratings are sparse. II. COLLABORATIVE RECOMMENDER SYSTEMS The concept of collaborative recommendation (also called the word-of-mouth approach) was first used in Goldberg et al.’s e-mail filtering system [17]. The idea was then quickly pursued for product recommendation. In this section, we further elaborate the sparsity problem and briefly survey some existing methods proposed in the literature for alleviating it.