Building a global normalized ontology for integrating geographic data sources Agustina Buccella a,Ã , Alejandra Cechich a , Domenico Gendarmi b , Filippo Lanubile b , Giovanni Semeraro b , Attilio Colagrossi c a GIISCO Research Group, Departamento de Ciencias de la Computacio ´n, Universidad Nacional del Comahue, Buenos Aires 1400, Neuquen 8300, Argentina b Dipartimento di Informatica, University of Bari, Via E. Orabona, 4, 70125 Bari, Italy c ISPRA-Istituto Superiore per la Protezione e la Ricerca Ambientale, Via Curtatone, 3, 00185 Rome, Italy article info Article history: Received 27 November 2009 Received in revised form 14 December 2010 Accepted 26 February 2011 Available online 8 April 2011 Keywords: Geographic information systems Data integration Ontology merging Heterogeneous databases Formal ontologies ISO 19100 abstract Nowadays, the proliferation of geographic information systems has caused great interest in integration. However, an integration process is not as simple as joining several systems, since any effort at information sharing runs into the problem of semantic heterogeneity, which requires the identification and representation of all semantics useful in performing schema integration. On several research lines, including research on geographic information system integration, ontologies have been introduced to facilitate knowledge sharing among various agents. Particularly, one of the aspects of ontology sharing is performing some sort of mapping between ontology constructs. Further, some research suggests that we should also be able to combine ontologies where the product of this combination will be, at the very least, the intersection of the two given ontologies. However, few approaches built integrations upon standard and normalized information, which might improve accuracy of mappings and therefore commitment and understandability of the integration. In this work, we propose a novel system (called GeoMergeP) to integrate geographic sources by formalizing their information as normalized ontologies. Our integral merging processincluding structural, syntactic and semantic aspectsassists users in finding the more suitable correspondences. The system has been empirically tested in the context of projects of the Italian Institute for Environmental Protection and Research (ISPRA, ex APAT), providing a consistent and complete integration of their sources. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction During recent years, geographic information has attracted more and more attention. The construction of new technologies such as GPS (global positioning system) devices, the new needs of the market, and the offer of free software and tools to recover, process, and store geographic information have generated a new explosion in this area. All of these aspects have encouraged the development of a large number of new geographic information systems, which are available on the Web. A fast search for geographic information on the Web will return several links representing different parts of the world, possibly from more than one system. But what happens when someone needs information that is divided into more than one system? For example, information about rivers in some countries can be obtained by querying two or more different systems. This new phenomenon, which is growing up each day, has generated new requirements that must be fulfilled. Among them, it is very common to find people or organizations trying to make systems work together. Thus, interoperability emerges as an outstanding requirement that every new system must consider. However, reaching interoperability is not an easy task. While the real world is assumed to be unique, its representation depends on the intended purpose: every representation of reality is user-specific. Thus, different applications that share interest in the same real-word phenomena may have different perceptions and therefore require different representations. Differences may arise in all facets that make up a representation: what amount of information is kept, how it is described, how it is organized (in terms of data structures), how it is coded, what constraints, processes, and rules apply, how it is presented, what are the associated spatial and temporal frameworks, etc. Thus, the pro- blem of data integration emerges as a new research challenge. It refers to merging or integrating two or more sources with over- lapped information and involves a set of decisions that must be made to reach a real integration. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences 0098-3004/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2011.02.022 Ã Corresponding author. E-mail addresses: abuccel@uncoma.edu.ar (A. Buccella), acechich@uncoma.edu.ar (A. Cechich), gendarmi@di.uniba.it (D. Gendarmi), lanubile@di.uniba.it (F. Lanubile), semeraro@di.uniba.it (G. Semeraro), attilio.colagrossi@apat.it (A. Colagrossi). Computers & Geosciences 37 (2011) 893–916