Efficiently Matching Proximity Relationships in Spatial Databases Xuemin Lin 1 , Xiaomei Zhou 1 , and Chengfei Liu 2 1 School of Computer Science and Engineering University of New South Wales, Sydney, NSW 2052, Australia {lxue, xmei}@cse.unsw.edu.au 2 School of Computing Sciences University of Technology, Sydney, NSW 2009, Australia liu@socs.uts.edu.au Abstract. Spatial data mining recently emerges from a number of real applications, such as real-estate marketing, urban planning, weather fore- casting, medical image analysis, road traffic accident analysis, etc. It de- mands for efficient solutions for many new, expensive, and complicated problems. In this paper, we investigate a proximity matching problem among clusters and features. The investigation involves proximity rela- tionship measurement between clusters and features. We measure prox- imity in an average fashion to address possible nonuniform data distri- bution in a cluster. An efficient algorithm, for solving the problem, is proposed and evaluated. The algorithm applies a standard multi-step paradigm in combining with novel lower and upper proximity bounds. The algorithm is implemented in several different modes. Our experiment results do not only give a comparison among them but also illustrate the efficiency of the algorithm. Keywords: Spatial query processing and data mining. 1 Introduction Spatial data mining is to discover and understand non-trivial, implicit, and pre- viously unknown knowledge in large spatial databases. It has a wide range of ap- plications, such as demographic analysis, weather pattern analysis, urban plan- ning, transportation management, etc. While processing of typical spatial queries (such as joins, nearest neighbouring, KNN, and map overlays) has been received a great deal of attention for years [2,3,4,28], spatial data mining, viewed as ad- vanced spatial queries, demands for efficient solutions for many newly proposed, expensive, complicated, and sometimes ad-hoc spatial queries. Inspired by a success in advanced spatial query processing techniques [2,3,4], [11,12,28], relational data mining [1,26,30], machine learning [9,10,22], compu- tational geometry [27], and statistics analysis [17,29], many research results and system prototypes in spatial data mining have been recently reported [2,5,6,13], The work of this author is partially supported by a small ARC R.H. G¨ uting, D. Papadias, F. Lochovsky (Eds.): SSD’99, LNCS 1651, pp. 188–206, 1999. c Springer-Verlag Berlin Heidelberg 1999