2982982982982989 Journal of Uncertain Systems Vol.3, No.4, pp.298-306, 2009 Online at: www.jus.org.uk Implementation of the Extended Fuzzy C-Means Algorithm in Geographic Information Systems Ferdinando Di Martino 1,2, ∗ , Salvatore Sessa 1 1 Università degli Studi di Napoli Federico II, Dipartimento di Costruzioni e Metodi Matematici in Architettura, Via Monteoliveto 3, 80134 Napoli, Italy 2 Università degli Studi di Salerno, Dipartimento di Matematica e Informatica Via Ponte Don Melillo, 84084 Fisciano, Italy Received 28 March 2009; Revised 23 June 2009 Abstract Density cluster methods have elevated computational complexity and are used in spatial analysis for the determination of impact areas. We propose the extended fuzzy c-means (EFCM) algorithm like alternative method because it has three advantages: robustness to noise and outliers, linear computational complexity and automatic determination of the optimal number of clusters. We implement the EFCM algorithm inside a geographic information systems (GIS) for the determination of buffer areas as hypersphere volume prototypes which are circles in the case of bidimensional pattern data. Indeed we have applied this algorithm in the spatial analysis of buffer areas called hotspots, including fire point-events of the Santa Fè district (NM), downloaded from http://www.fs.fed.us/r3/gis/sfe_gis.shtml. © 2009 World Academic Press, UK. All rights reserved. Keywords: extended fuzzy c-means, fuzzy c-means, GIS, hotspot 1 Introduction In spatial analysis a buffer area is an area at a specified distance around to features of a theme. This area is determined as a polygon by defining distance parameters that can be set as constants or variables, determined by feature attributes: for instance, circular buffer areas are obtained around a feature of the theme by using the radius of the circle as distance parameter. Buffer areas are calculated in many fields of the spatial analysis and they can determine dangerous bounded zones: for examples, areas around an epicenter of an earthquake, areas of industrial pollution, urban areas where the construction of buildings is forbidden from the local legislation. The buffering primitive operations in a geographical information system (GIS) concern points, lines and polygons. In spatial analysis, an area having dimensions of a continent can be considered, with a good approximation, as a plane and we apply the Euclidean geometry in the calculus of distances. For this reason the buffer area around a point on the map is formed by a circle (“circular polygon” in terms of analysis spatial) centered in that point. For instance, the epicentre of an earthquake or the location of a criminal event can be represented from a point. The radius of this circle is called the buffer distance which is assumed by the user either as a constant value for all the point data or as the value of a field in the point data table. Moreover the user has two options: to separate these circular buffer areas (cfr. Fig.1) or to merge some of them by obtaining new polygonal areas (cfr. Fig.2). When the number of event-points is elevated, the classical density methods are not suitable for the determination of impact areas because of high computational complexity. Then the usage of cluster algorithms seems more appropriate: it is well known that the clusters contain similar data and the degree of association is weak between data of different clusters. Clustering algorithms (e.g., [8, 9, 11, 12, 13, 14]) are useful for the determination of buffer areas, called hotspots in crime analysis, car crash analysis, disease diffusion analysis, etc. For instance, the National Institute of Justice at Washington DC (USA) has developed a statistical tool, CrimeSTAT [9], for the GIS analysis of crime incident locations. We refer to [6] for an exhaustive list of clustering techniques which determine hotspots. In order to determine the shape of each hotspot we have to use a density estimation method ([8, 13]), whereas the fuzzy c-means (FCM) algorithm [3] uses punctual cluster prototypes. However in many cases is not necessary to ∗ Corresponding author. Email: fdimarti@unina.it (F. Di Martino).