GRAPH-BASED KNOWLEDGE REPRESENTATION FOR GIS DATA Manuel Pech Palacio 1 , David Sol 1 , Jesús González 2 {sp205175, sol}@mail.udlap.mx, jagonzalez@inaoep.mx 1 Universidad de las Américas-Puebla 2 Instituto Nacional de Astrofísica Óptica y Electrónica Puebla, México Abstract This paper presents a proposal to create a graph representation for GIS, using both spatial and non-spatial data and also including spatial relations between spatial objects. Because graphs are a powerful and flexible knowledge representation we will be able to combine spatial and non-spatial data at the same time and this is one of the strengths of the proposal. We hope to apply this knowledge representation to the data mining process with GIS data including three types of spatial relations: topological, orientation and distance. 1. Introduction In the last years the human capabilities in generating and collecting data have been increasingly widespread. The explosive growth in data and databases has created a need for techniques and tools that can transform the data into useful information and knowledge. In the beginning, the goals of these techniques and tools were to discover knowledge that could exist in relational data. Nowadays, with the growth of the applications that deal with georeference data, an important increase is noticed in the management and analysis of spatial data. Spatial data has many characteristics that distinguish it from relational data. For example, it has topological, distance, and direction information organized by multidimensional spatial indexed structures. Another difference is the query language that is used to access spatial data. The complexity of the spatial data type is another important feature. Different approaches have been developed for knowledge discovery from spatial data, next we briefly present some of them: Generalization [22][14]. Data and objects often contain detailed information at primitive concept levels. It is often desirable to summarize a large set of data and present it at a high concept level. It assumes the existence of background knowledge in the form of concept hierarchies. In the case of a spatial database, there can be two kinds of concept hierarchies, thematic and spatial. Lu et al. [22] extended attribute-oriented induction to spatial databases and presented two algorithms, spatial data dominant and non-spatial data dominant generalizations. Clustering [16][23][28][26] can be defined as the process of grouping physical or abstract objects into classes of similar objects. Spatial data clustering identifies clusters, or densely populated regions, according to some measurement in a large, multidimensional data set. In many situations it is desirable to explore spatial associations [19][11] to discover rules which associate one or more spatial objects with other spatial objects. There are various kinds of spatial predicates that could constitute a spatial association rule. Examples include topological relations like intersects, overlap, disjoint; spatial orientations like left_of, west_of; and distance information such as close_to, or far_away. Approximation and aggregation [17]. Clustering approaches try to answer questions like where the clusters in the spatial database can be located. Another problem is to find out why the clusters are there. We can rephrase the question to ask about the characteristics of the clusters in terms of the objects that are close to them. We need to analyze the objects in the cluster and the objects close to them. Finally we have three other methods to discover knowledge in datasets: Mining an image database [12][11] can be viewed as another approach of spatial data mining. Classification learning [20] is the task of assigning an object to a class from a given set of classes based on the attribute values of the object. Spatial Trend Detection [9] can describe a regular change of one or more non-spatial attributes of an object that changes its position in time. The remainder of this paper is organized as follows: Sections 2 and 3 present basic topics about spatial and Proceedings of the Fourth Mexican International Conference on Computer Science (ENC’03) 0-7695-1915-6/03 $17.00 © 2003 IEEE