Data Mining and Knowledge Discovery 2, 169–194 (1998) c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications J ¨ ORG SANDER sander@informatik.uni-muenchen.de MARTIN ESTER ester@informatik.uni-muenchen.de HANS-PETER KRIEGEL kriegel@informatik.uni-muenchen.de XIAOWEI XU xwxu@informatik.uni-muenchen.de Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 M¨ unchen, Germany Editor: Usama Fayyad Received February 21, 1997; Revised December 1997 and March 1998; Accepted April 1998 Abstract. The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we generalize this algorithm in two important directions. The generalized algorithm—called GDBSCAN—can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes. In addition, four applications using 2D points (astronomy), 3D points (biology), 5D points (earth science) and 2D polygons (geography) are presented, demonstrating the applicability of GDBSCAN to real-world problems. Keywords: clustering algorithms, spatial databases, efficiency, applications 1. Introduction Spatial Database Systems (SDBS) (Gueting 1994) are database systems for the management of spatial data, i.e., point objects or spatially extended objects in a 2D or 3D space or in some high-dimensional vector space. While a lot of research has been conducted on knowl- edge discovery in relational databases in the last years, only a few methods for knowledge discovery in spatial databases have been proposed in the literature. Knowledge discovery becomes more and more important in spatial databases since increasingly large amount of data obtained from satellite images, X-ray crystallography or other automatic equipment are stored in spatial databases. Data mining is a step in the KDD process consisting of the application of data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data (Fayyad et al., 1996). Clustering, i.e., grouping the objects of a database into meaningful subclasses, is one of the major data mining methods (Matheus et al., 1993). There has been a lot of research on clustering algorithms for decades but the application to large spatial databases introduces the following new requirements: 1. Minimal requirements of domain knowledge to determine the input parameters, because appropriate values are often not known in advance when dealing with large databases.