Border Samples Detection for Data Mining Applications Using Non Convex Hulls Asdr´ ubal L´ opez Chau 1,3 , Xiaoou Li 1 , Wen Yu, 2 , Jair Cervantes 3 , and Pedro Mej´ ıa- ´ Alvarez 1 1 Computer Science Department, CINVESTAV-IPN, Mexico City, Mexico achau@computacion.cs.cinvestav.mx, {lixo,pmalavrez}@cs.cinvestav.mx 2 Automatic Control Department, CINVESTAV-IPN, Mexico City, Mexico yuw@ctrl.cinvestav.mx 3 Graduate and Researh, Autonomous University of Mexico State,Texcoco Mexico chazarra17@gmail.com Abstract. Border points are those instances located at the outer mar- gin of dense clusters of samples. The detection is important in many areas such as data mining, image processing, robotics, geographic infor- mation systems and pattern recognition. In this paper we propose a novel method to detect border samples. The proposed method makes use of a discretization and works on partitions of the set of points. Then the border samples are detected by applying an algorithm similar to the pre- sented in reference [8] on the sides of convex hulls. We apply the novel algorithm on classification task of data mining; experimental results show the effectiveness of our method. Keywords: Data mining, border samples, convex hull, non-convex hull, support vector machines. 1 Introduction Geometric notion of shape has no associated a formal meaning[1], however in- tuitively the shape of a set of points should be determined by the borders or boundary samples of the set. The boundary points are very important for several applications such as robotics [2], computer vision [3], data mining and pattern recognition [4]. Topologically, the boundary of a set of points is the closure of it and defines its shape[3]. The boundary does not belong to the interior of the shape. The computation of border samples that better represent the shape of set of points has been investigated for a long time. One of the first algorithms to compute it is the convex hull (CH). The CH of a set of points is the minimum convex set that contains all points of the set. A problem with CH is that in many cases, it can not represent the shape of a set, i.e., for set of points having interior “corners” or concavities the CH ommits the points that determine the border of those areas. An example of this can be seen in Fig. 1. I. Batyrshin and G. Sidorov (Eds.): MICAI 2011, Part II, LNAI 7095, pp. 261–272, 2011. c Springer-Verlag Berlin Heidelberg 2011