Classification by Reflective Convex Hulls Mineichi Kudo Atsuyoshi Nakamura Division of Computer Science Graduate School of Information Sci. and Tech. Hokkaido University, Sapporo 060-0814, JAPAN E-mail: {mine,atsu}@main.ist.hokudai.ac.jp Ichigaku Takigawa Institute for Chemical Research Bioinformatics Center Kyoto University E-mail: takigawa@kuicr.kyoto-u.ac.jp Abstract A set of convex bodies including samples of a single class only is used for classification. The convex body is defined by some facets (hyper-planes) that separate the class from the other classes. This paper describes an algorithm to find a set of such convex bodies efficiently and examine the performance of a classifier using them. The relationship to the support vector machines is also discussed. 1. Introduction The convex hull conv(S) of a finite set S in m- dimensional Euclidean space is one of central concepts in computational geometry. In pattern recognition, the convex hulls which cover all the training samples of one class allows us to measure the separability among classes. Indeed, the relationship between those convex hulls and support vector machines (SVMs) have been well studied [2, 5, 8]. A typical view of such trials is that the hyper-plane of an SVM is identical to the bi- sector hyper-plane between the closest points of convex hulls of two classes [8]. When we use convex hulls for classification, the fol- lowing problems arise: 1) The convex hull of a finite set is hard to be constructed in high dimensions, 2) It costs much to calculate the distance between a point and the convex hull, and 3) In general, we need more than one convex hull for approximating a class region. For 1), there is no efficient algorithm to find the convex hull explicitly in high dimensions. Indeed, the number of facets is often exponential in m. For 2), the problem to calculate the distance D(x, conv(S)) for x conv(S) is known to be NP-hard in the representation size of conv(S) [7]. For 3), we need more than one convex hull to exclude samples from the classes other than a target class. The authors have already proposed such an approach using quasi convex hulls with restricted an- gles [6]. In this paper, to cope with these three problems, we use several randomize techniques. We would obtain ef- ficiency at the expense of loosing the perfection to some extent. 2. Convex Hulls and Support Functions The simplest definition of the convex hull conv(S) of a given dataset S, is the intersection of all convex sets containing S. For a finite set S, C = conv(S) is a polyhedron with at most |S| vertices. Such a polyhe- dron can be defined in several ways. By ∂C, we de- note the boundary of C and divide it into q-faces ac- cording to the dimensions. For example, 0-faces are the vertices of C and (m - 1)-faces are the facets or hyper-planes. Let V (C) be the set of vertices of C and F (C) be the set of facets of C. The second definition is called V -representation and is defined as C = {y = c x x| c x =1,c x 0,x V (C)}. The third one is called H-representation and is defined as C = {y|〈w, y〉≤ c, (w, c) F (C)}, where 〈·, ·〉 is the inner product and a facet (w, c) is specified by a normal vector w (||w|| = 1) and a constant c R. In this paper, as the fourth definition, we use support functions to express a convex hull C. A support func- tion with a unit vector w (||w|| = 1) is given by H(S, w) = sup{〈x, w〉| x S}, where sup denotes the supremum. With all possible di- rections w, we can specify C as C = w:||w||=1 {x|〈x, w〉≤ H(S, w)}. Of course, it is sufficient to use w of (w, c) F (C) instead of all possible w’s. To enhance the role we call a plane h(S, w)= {x|〈x, w= H(S, w)} a support 978-1-4244-2175-6/08/$25.00 ©2008 IEEE