Convex Clustering Shrinkage K. Pelckmans, J. De Brabanter † , J.A.K. Suykens, and B. De Moor K.U.Leuven ESAT-SCD/SISTA, Kasteelpark Arenberg 10, 3001 Leuven, Belgium †: Hogeschool KaHo Sint-Lieven (Associatie KULeuven), Departement Industrieel Ingenieur, 9000 Gent, Belgium {kristiaan.pelckmans,johan.suykens}@esat.kuleuven.ac.be, http://www.esat.kuleuven.ac.be/sista/lssvmlab Abstract. This paper proposes a convex optimization view towards the task of clustering. Herefor, a shrinkage term is proposed resulting in sparseness amongst the differences between the centroids. Given a fixed trade-off term between clus- tering loss and this shrinkage term, clusters is obtained by solving a convex op- timization problem. Varying the trade-off term yields an hierarchical clustering tree. An efficient algorithm for larger datasets is derived and the method is illus- trated briefly. 1 Introduction The term cluster analysis encompasses a number of different algorithms and methods (Tree Clustering, Block Clustering, k-Means Clustering and EM algorithms) for group- ing objects of similar kind into respective categories. In other words cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Clustering techniques have been applied to a wide variety of research problems. Hartigan [5] provides an excellent summary of the many published studies reporting the results of cluster analyses. There are many books on clustering, including [4] , [3] and [9]. The classic k-Means algorithm was popularized and refined by [5]. The EM algorithm for clustering is described in detail e.g. in [15]. A recent overview of the literature on the subject is given in [16]. Shrinkage techniques, for regression and discriminant analysis, have been studied extensively since the seminal works by [12] and [8]. With a term borrowed from approx- imation theory, these methods are also called regularization methods [14]. Ridge re- gression, a fundamental shrinkage construction, was introduced by [7], while the Least Absolute Shrinkage Selection Operator (LASSO) was proposed by [13]. The solution results from solving a quadratic programming problem which can be accelerated by using dedicated decomposition methods (as SMO, [11]), sometimes all solutions cor- responding with any regularization trade-off constants (the solution path) can be com- puted efficiently by exploiting the Karush-Kuhn-Tucker conditions for optimality as in the case of Least Angle Regression (LARS) for computing the solution path of the LASSO estimator [2] and in the related SVMs [6]. The following method realizes following objectives: (i) obtaining clustering by solv- ing a convex optimization problem (ii) obtaining an implicit representation of clusters