PREPRINT SUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Multi-View Partitioning via Tensor Methods Xinhai Liu, Shuiwang Ji, Wolfgang Gl¨ anzel, and Bart De Moor, Fellow, IEEE Abstract— Clustering by integrating multi-view representations has become a crucial issue for knowledge discovery in hetero- geneous environments. However, most prior approaches assume that the multiple representations share the same dimension, limiting their applicability to homogeneous environments. In this paper, we present a novel tensor-based framework for integrating heterogeneous multi-view data in the context of spectral cluster- ing. Our framework includes two novel formulations; that is multi-view clustering based on the integration of the Frobenius- norm objective function (MC-FR-OI) and that based on matrix integration in the Frobenius-norm objective function (MC-FR- MI). We show that the solutions for both formulations can be computed by tensor decompositions. We evaluated our methods on synthetic data and two real-world data sets in comparison with baseline methods. Experimental results demonstrate that the proposed formulations are effective in integrating multi-view data in heterogeneous environments. Index Terms— Multi-view clustering, tensor decomposition, spectral clustering, multi-linear singular value decomposition, higher-order orthogonal iteration I. I NTRODUCTION In many real-world scenarios, each object can be described by multiple sets of features. For example, in scientific literature mining, both the textual content and the citation link between articles are often used in the knowledge discovery processes [25]. In multiplex network analysis, we are given a set of multiple networks that share the same set of nodes but possess network- specific links representing different types of relationships between nodes [29]. A particular instance of this scenario is the social network of university students, which may include symmetrized connections from (i) Facebook friendship, (ii) picture friend- ship, (iii) roommate relations, and (iv) student housing-group preference. These diverse individual activities result in multiple relationship networks among students. Such a learning scenario is called multi-view learning, since each feature set describes a view of the same set of underlying objects. A simple approach to learn from these multi-view data is to learning from each view separately. However, such approaches fail to account for the complementary information encoded into different views. Multi-view clustering refers to the clustering of the same set of objects with multi-view features, either from various information X. Liu is with the Credit Reference Center & Financial Research In- stitute, The People’s Bank of China, Beijing, 100800, China. E-mail: xin- hai.liu@yahoo.com. X. Liu is also with College of Information Science and Engineering & ERCMAMT, Wuhan University of Science and Technology, 430081, Wuhan, China. S. Ji is with the Department of Computer Science, Old Dominion Univer- sity, Norfolk, VA, 235290162, USA. W. Gl¨ anzel is with Center for R &D Monitoring (ECOOM), Dept.MSI, Katholieke Universiteit Leuven, Leuven, B3000, Belgium and Hungarian Academy of Sciences, IRPS, Budapest, Hungry. B.D. Moor is with Department of Electrical Engineering, ESATSCD and IBBT K.U.Leuven Future Health Department, Katholieke Universiteit Leuven, Leuven, B3001, Belgium. sources or from different feature representations. Compared with the clustering that is implemented on single-view data, multi- view clustering is expected to yield robust and novel partition results by exploiting the complementary information in different views. One of the recent developments in clustering is the spectral clustering technique, which has seen an explosive proliferation over the past several years [44]. Among many other factors, such as easy implementation and efficiency, one of the key advantages of spectral clustering is that it is based on the relaxation of a global clustering criterion (i.e., normalized cuts). Spectral clus- tering has been widely employed in many real applications, from image segmentation to community detection. Although spectral clustering [28] works well on single-view data, it is not well suited for the clustering of multi-view data, since it is inherently based on matrix decompositions. Recently, several multi-view clustering algorithms have been proposed [1], [3], [5], [25], [26], [37], [40], [47]. These multi- view clustering techniques have been shown to yield better performance in comparison to single-view techniques. However, prior methods have some limitations that prevent their wide applicabilities, as we will discuss in the related work. For instance, some techniques assume that the dimensions of the features in multiple views are the same, limiting their applicability to the homogeneous settings. Some other techniques only concentrate on the clustering of two-view data so that it might be hard to extend them to more than a two-view situation [3]. In addition, an appropriate weighting scheme is lacking for these multiple views although coordinating various information from them is also one crucial step in gaining good clustering results [37], [41]. A unified framework that can integrate various types of multi-view data is lacking to date [26], [40]. Tensors are higher-order generalizations of matrices. They have been successfully applied to several domains, such as chemo- metrics, signal processing, Web search, data mining, scientific computing and image recognition [10], [21], [22], [34], [38], [45]. Traditionally, tensor-based methods have been used to model multi-view data [21], and tensor methods are very powerful tools to analyze the latent pattern hidden in multi-view data. Tensor decompositions capture multi-linear structures in higher-order data-sets, where the data have more than two modes. Tensor de- compositions and multi-way analysis allow for extracting hidden (latent) components (cluster structure) and investigating complex relationship among them. In this paper, we propose a multi-view clustering framework based on tensor methods. Our formulations model the multi- view data as a tensor and seek a joint latent optimal subspace by tensor analysis. Our framework can leverage the inherent consistency among multi-view data and integrate their information seamlessly. Apart from other multi-view clustering strategies, which are usually devised for ad hoc application, our method provides a general framework in which some limitations of prior methods are overcome systematically. In particular, our framework can be extended to various types of multi-view data.