Alleviate sparsity problem using hybrid model based on spectral co-clustering and tensor factorization Mahdi Nasiri Computer Engineering Department Iran university of Science and Technology, Tehran, Iran Nasiri_m@iust.ac.ir Zeinab Sharifi Computer Science Department Science and Research Branch Islamic Azad University Guilan, Iran Sharifi.zeinab@gmail.com Behrouz Minaei Computer Engineering Department Iran University of Science and Technology Tehran, Iran b_minaei@iust.ac.ir Abstract—Most of recommender systems suffer from common challenge which is named data sparsity. This problem has an important effect on the performance of collaborative filtering and causes overfitting problem. This problem is critical when new dimensions are added to data and the sparsity problem is considered in more than two dimensions. In this paper, ‘time’ as a third dimension is considered. Therefore, data sparsity increases with increasing dimensions of data. For alleviating this problem, this research applies a framework to block users and items co-occurrences in the similar cluster and adds time to each block then imputes appropriate values for missing data based on similar user and item ratings are assigned to each block. Preprocess based on optimization is performed on each cluster. After this step, data points in each block are merged together and are used tensor factorization to model relations between users, items and times. Our novel approach has two advantages: (a) it increases the speed of convergence and avoids overfitting the observed data, (b) it reduces sparsity problem and error rate. The evaluation metric demonstrate that our algorithm works well in practice. Keywords-Data sparsity; Recommender system; Spectral co-clustering; Tensor factorization I. INTRODUCTION Recommender Systems were introduced in 1990 to solve the problem of overload on the web [1]. One of the most common methods of recommender systems is collaborative filtering (CF) approach [2]. The CF approach is suitable to explore hidden factors in data [3]. The explosive growth of users and items in Web has created major challenges for recommender systems. Sparsity of data is a major problem in this system. In this article, we focus on the sparsity problem, which is refered to the lack of sufficient data in dataset. This problem is caused to reduce number of predictions in dataset [3]. By adding another dimension to data, the amount of data sparsity increases. Consequently, sparse and high-dimensional data present special challenges and can lead to qualitatively poor predictions. We aim to propose two solutions to overcome this problem. First, we proposed spectral co-clustering to partition rows and columns simultaneously. Modeling co- occurrence of rows and columns is a fundamental issue of unsupervised learning that exploit the apparent duality between them [4]. This approach is used in text and document mining [5, 6], bioinformatics and gene expression analysis [7, 8], and many other practical applications. While there are now a number of approaches to co- cluster such as based on spectral graph theory [9] and information theory [10], each with its advantages, co- clustering in this paper based on spectral graph partitioning. Spectral graph partitioning is another effective heuristic that was introduced in the early 1970s [10, 11], and popularized in 1990[12]. Second solution to solve this challenge is tensor factorization (TF). Tensor factorization is a model-based collaborative filtering approach. This approach uses the information in the recommender system and applies the learning techniques to create a model and then uses the model to predict the missing values in tensor [13]. Most of recommender systems consider relations among data in two dimensional. Supposed that we have three entities, such as user, item, and tag. Most of recommendation algorithms study pairwise relations between two entities, namely, user-item, user-tag, item-tag [14]. Therefore, the ability to analyze relations between three entities of data is described by tensor. Tensor is applied in a variety of applications and fields such as chemistry, signal processing and image processing [15]. Recommender systems based on TF often add a third dimension to two-dimensional data. In this research, ‘time’ as a third dimension is studied [16]. Higher order singular value decomposition (HOSVD) is one of the simple and efficient techniques in Tensor factorization. This method is introduced in [17] and HOSVD method is a special type of Tucker factorization. The idea of this technique is development of singular value decomposition (SVD) in matrices [20] with more than two dimensions. Also, we aim to use HOSVD based on optimization in this article. The focus of this paper is based on blocking of rows and columns in similar groups. Then preprocess is performed based on optimization to fill missing values in each group. Therefore, we merge blocks together and apply tensor factorization to predict missing values in the data set. The following is a brief outline of the paper. Section 2 presents the related works. The proposed algorithm is