A Minimum Volume Covering Approach with a Set of Ellipsoids David Martı ´nez-Rego, Enrique Castillo, Oscar Fontenla-Romero, and Amparo Alonso-Betanzos, Senior Member, IEEE Abstract—A technique for adjusting a minimum volume set of covering ellipsoids technique is elaborated. Solutions to this problem have potential application in one-class classification and clustering problems. Its main original features are: 1) It avoids the direct evaluation of determinants by using diagonalization properties of the involved matrices, 2) it identifies and removes outliers from the estimation process, 3) it avoids binary variables resulting from the combinatorial character of the assignment problem that are replaced by continuous variables in the range ½0; 1, 4) the problem can be solved by a bilevel algorithm that in its first level determines the ellipsoids and in its second level reassigns the data points to ellipsoids and identifies outliers based on an algorithm that forces the Karush-Kuhn-Tucker conditions to be satisfied. Two theorems provide rigorous bases for the proposed methods. Finally, a set of examples of application in different fields is given to illustrate the power of the method and its practical performance. Index Terms—One class classification, data clustering, bilevel algorithm, minimum volume covering ellipsoids Ç 1 INTRODUCTION T HE minimum volume covering ellipsoid (MVCE) pro- blem has been studied since John [1] discussed it for the first time in his work on optimality conditions. The problem consists of covering a set of fx 1 ; x 2 ; ... ; x m g2 IR n points with an ellipsoid of minimum volume. The problem can be found in various formulations, each one presenting differ- ent properties [2], [3]. In its most simple formulation we define an ellipsoid E IR n as E ¼fx 2 IR n jðx aÞ T Mðx aÞ 1g; ð1Þ where a is the center of the ellipsoid and M 2 IR nn determines its shape. Given this representation, the volume of E is given by the formula n=2 ðn=2 þ 1Þ 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi detðMÞ p ; ð2Þ where ðnÞ is the gamma function. Thus, the mathematical formulation of the problem is as follows: Minimize a;M detðMÞ 1 ð3Þ subject to ðx i aÞ T Mðx i aÞ 1; ð4Þ M 0; ð5Þ where means a positive definite matrix restriction. Given its applicability in areas such as statistics and data mining, several algorithms for solving it have been developed in the past decades. In [4], Barnes provided an algorithm based on matrix eigenvalue decomposition. Posteriorly Khachiyan and Todd [8] first used interior-point methods in develop- ing an algorithm for this purpose. This seminal work is the root of recent developments [6], [7]. From a theoretical point of view, several authors obtained bounds for the complexity of the problem. Nesterov and Nemirovskii [9] obtained a complexity upper bound of Oðm 3:5 logðmR=rÞÞ operations for a -optimal ellipsoid where m is the number of points, R 2 IR is defined as the radius of a ball that covers this convex hull of the given set of points, and r 2 IR is the radius of a ball inscribed in the convex hull. More recently, Khachiyan [5] reduced this bound to Oðm 3:5 logðm=ÞÞ operations. There is a classic and well-known result in John [1] which states that the number of boundary points is not too large: The minimum-volume covering ellipsoid in an n dimen- sional space is determined by a subset of at most Oðn 2 þ 3n=2Þ points. This discovery was the motivation behind the design of active-set strategies for solving the problem such as the one in [3], wherein they try to make an intelligent guess of active points x i at each iteration and presumably inactive points are discarded from time to time. Recent developments have shown that this problem can be formulated as a semidefinite programming problem (SDP) which can now be solved efficiently with standard software [2], [10]. It also can be considered an instance of the more general problem of log-determinant maximization (minimi- zation) for which several solving methods can be found [11]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. 12, DECEMBER 2013 2997 . D. Martı´nez-Rego,O. Fontenla-Romero, and A. Alonso-Betanzos are with the Department of Computer Science, Facultad de Informa´tica, University of A Corun ˜a, Campus de Elvin ˜a s/n, 15071 A Corun ˜a, Spain. E-mail: {dmartinez, ofontenla, ciamparo}@udc.es. . E. Castillo is with the Department of Applied Mathematics and Computational Sciences, Escuela Te´cnica Superior de Ingenieros de Caminos, University of Cantabria, Avenida de los Castros s/n, Santander, Cantabria 39005, Spain. E-mail: enrique.castillo@unican.es. Manuscript received 1 July 2011; revised 26 Sept. 2012; accepted 23 Apr. 2013; published online 16 May 2013. Recommended for acceptance by A. Fitzgibbon. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number TPAMI-2011-07-0426. Digital Object Identifier no. 10.1109/TPAMI.2013.94. 0162-8828/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society