Classiﬁcation by Reﬂective Convex Hulls Mineichi Kudo Atsuyoshi Nakamura Division of Computer Science Graduate School of Information Sci. and Tech. Hokkaido University, Sapporo 060-0814, JAPAN E-mail: {mine,atsu}@main.ist.hokudai.ac.jp Ichigaku Takigawa Institute for Chemical Research Bioinformatics Center Kyoto University E-mail: takigawa@kuicr.kyoto-u.ac.jp Abstract A set of convex bodies including samples of a single class only is used for classiﬁcation. The convex body is deﬁned by some facets (hyper-planes) that separate the class from the other classes. This paper describes an algorithm to ﬁnd a set of such convex bodies efﬁciently and examine the performance of a classiﬁer using them. The relationship to the support vector machines is also discussed. 1. Introduction The convex hull conv(S) of a ﬁnite set S in m- dimensional Euclidean space is one of central concepts in computational geometry. In pattern recognition, the convex hulls which cover all the training samples of one class allows us to measure the separability among classes. Indeed, the relationship between those convex hulls and support vector machines (SVMs) have been well studied [2, 5, 8]. A typical view of such trials is that the hyper-plane of an SVM is identical to the bi- sector hyper-plane between the closest points of convex hulls of two classes [8]. When we use convex hulls for classiﬁcation, the fol- lowing problems arise: 1) The convex hull of a ﬁnite set is hard to be constructed in high dimensions, 2) It costs much to calculate the distance between a point and the convex hull, and 3) In general, we need more than one convex hull for approximating a class region. For 1), there is no efﬁcient algorithm to ﬁnd the convex hull explicitly in high dimensions. Indeed, the number of facets is often exponential in m. For 2), the problem to calculate the distance D(x, conv(S)) for x ∈ conv(S) is known to be NP-hard in the representation size of conv(S) [7]. For 3), we need more than one convex hull to exclude samples from the classes other than a target class. The authors have already proposed such an approach using quasi convex hulls with restricted an- gles [6]. In this paper, to cope with these three problems, we use several randomize techniques. We would obtain ef- ﬁciency at the expense of loosing the perfection to some extent. 2. Convex Hulls and Support Functions The simplest deﬁnition of the convex hull conv(S) of a given dataset S, is the intersection of all convex sets containing S. For a ﬁnite set S, C = conv(S) is a polyhedron with at most |S| vertices. Such a polyhe- dron can be deﬁned in several ways. By ∂C, we de- note the boundary of C and divide it into q-faces ac- cording to the dimensions. For example, 0-faces are the vertices of C and (m - 1)-faces are the facets or hyper-planes. Let V (C) be the set of vertices of C and F (C) be the set of facets of C. The second deﬁnition is called V -representation and is deﬁned as C = {y = ∑ c x x| ∑ c x =1,c x ≥ 0,x ∈ V (C)}. The third one is called H-representation and is deﬁned as C = {y|〈w, y〉≤ c, ∀ (w, c) ∈ F (C)}, where 〈·, ·〉 is the inner product and a facet (w, c) is speciﬁed by a normal vector w (||w|| = 1) and a constant c ∈ R. In this paper, as the fourth deﬁnition, we use support functions to express a convex hull C. A support func- tion with a unit vector w (||w|| = 1) is given by H(S, w) = sup{〈x, w〉| x ∈ S}, where sup denotes the supremum. With all possible di- rections w, we can specify C as C =  w:||w||=1 {x|〈x, w〉≤ H(S, w)}. Of course, it is sufﬁcient to use w of (w, c) ∈ F (C) instead of all possible w’s. To enhance the role we call a plane h(S, w)= {x|〈x, w〉 = H(S, w)} a support 978-1-4244-2175-6/08/$25.00 ©2008 IEEE