Information Bottleneck Co-clustering Pu Wang ∗ Carlotta Domeniconi ∗ Kathryn Blackmond Laskey † Abstract Co-clustering has emerged as an important approach for mining contingency data matrices. We present a novel ap- proach to co-clustering based on the Information Bottle- neck principle, called Information Bottleneck Co-clustering (IBCC), which supports both soft-partition and hard- partition co-clusterings, and leverages an annealing-style strategy to bypass local optima. Existing co-clustering meth- ods require the user to define the number of row- and column-clusters respectively. In practice, though, the num- ber of row- and column-clusters may not be independent. To address this issue, we also present an agglomerative Infor- mation Bottleneck Co-clustering (aIBCC) approach, which automatically captures the relation between the numbers of clusters. The experimental results demonstrate the effective- ness and efficiency of our techniques. Keywords: Co-clustering; Information Bottleneck 1 Introduction Co-clustering [1] has emerged as an important ap- proach for mining dyadic and relational data. Dyadic or relational observations are indexed by pairs of objects drawn from two index sets. For example, the index sets might be documents and words; the data might be inci- dence or counts of words in documents. Co-clustering allows documents and words to be grouped simultane- ously and interdependently: documents are clustered based on the contained words, and words are grouped based on the documents in which they appear. Some researchers have proposed a hard-partition version, Information Theoretic Co-clustering [2], oth- ers a soft-partition version [3, 4] of co-clustering. Usu- ally, co-clustering algorithms are iterative, and an ini- tialization of the clusters is required. The selection of a good initialization is a critical issue, since a random ∗ Dept. of Computer Science, George Mason University † Dept. of Systems Engineering and Operations Research, George Mason University initialization often leads to sub-optimal solutions. In this work, we propose a co-clustering ap- proach based on the Information Bottleneck princi- ple (Information Bottleneck Co-clustering, or IBCC). IBCC supports both soft-partition and hard-partition co-clustering. Furthermore, it uses an annealing-style strategy, called the continuation method [5], inspired by [7], which enables it to find near global optimal solu- tions. We also introduce an agglomerative Information Bottleneck Co-clustering (aIBCC) approach, which au- tomatically captures the relation between the number of row- and column-clusters. Information Bottleneck provides the foundation for a principled approach to co-clustering. As in [2], we view the data matrix as generated by a joint proba- bility distribution between two random variables that are indexed by the rows and columns. Our approach finds row-clusters and column-clusters in an inter- twined fashion so that the resulting clusters compress the data as much as possible, while preserving the rel- evant information. We make use of the formalism of Bayesian networks to specify the inter-dependencies between rows and columns of the data matrix, and thereby achieve the desired co-clustering. While IBCC uses the same model as symmetric IB [8], our method also leverages the continuation method to achieve better solutions, and allows for both soft-partition and hard-partition clusterings. We also compare IBCC and Information Theoretic Co- clustering (ITCC) theoretically, and show that IBCC is an extension to ITCC in principle. Finally, we empirically analyze the relation between the number of row- and column-clusters. 2 Related Work Information Bottleneck (IB) [6] is a powerful clustering approach, that leverages mutual information to evalu- ate how much information regarding the original data is kept by the clusters. It achieves a trade-off between