Ontology-based Solution for Data Warehousing in Genetic Neurological Disease Donia Awad, Hassan Tout, Vincent Courboulay, Arnaud Revel Abstract- In the field of genetic disorder of the nervous system, there is a huge amount of information available on the Internet. Extracting relevant information from these heterogeneous sources is a complex task usually dedicated to data warehouse. Heterogeneity can be related to the structure or the semantics of sources. While solutions exist to solve the first problem, the second one remains a major problem. In this article, we propose an ontology-based solution to the problem of semantic heterogeneity of biological sources. Our objective is to facilitate the definition of a warehouse containing data from these sources by considering solutions in the CIG (Cooperative Information Gathering) domain. Our solution consists of a set of three models: “Topic”, “Semantics”, and “Cooperative answer”; represented theoretically by logical predicates. In this paper, we present an implementation of this ontology in Protégé using OWL and SWRL languages. Index terms- data warehouse, semantic heterogeneity, information gathering, ontology, cooperative answer I. INTRODUCTION iological data sources are known for their heterogeneity in many aspects such as data structure and semantics. But the semantic heterogeneity is considered as the most important problem in biological database systems because it involves the content of information and its intended meaning [1]. This problem has become more and more important in data warehousing where this topic encompasses architectures, algorithms and tools for bringing together selected data from multiple databases or other information sources into a single repository called a data warehouse, suitable for direct querying or analysis. To manage this problem, the meaning of interchanged information has to be understood across the systems [2]. In this article, we propose a solution to the problem of semantic heterogeneity of data sources referring to genetic Neurological disease. Our choice of this domain is justified by the existence of a huge number of data sources referring to genetic Neurological diseases, and by the exuberance of semantic heterogeneity among their related terms. Furthermore, to our knowledge, there is not any single study that has dealt with the problem of semantic heterogeneity in this domain. Dounia AWAD was at Doctoral school of science and Technology, Lebanese University. She is now at La Rochelle University, L3I Laboratory, France (dounia.awad@gmail.com ) Hassan Tout is at Lebanese University, Lebanon. Vincent Courboulay and Arnaud Revel are at L3I Laboratory, La Rochelle University. Let us first present the main problem in data warehousing, i.e. “semantic heterogeneity” of information sources, and the solutions proposed to solve it. As we will see, the common point between those solutions is to use ontologies to deal with the problem of semantic heterogeneity. In the second stage, we propose an ontology to deal with the problem of semantic heterogeneity in the context of data warehousing available in genetic neurological diseases. Our ontology is made up of two models: 1- a semantic model that deals with the heterogeneity problem, and 2- a cooperative answer model that provides a cooperative answer in response to a data warehouse user’s question. Finally, we present an implementation of our ontology with Protégé using its associated languages OWL and SWRL. II. SEMANTIC HETEROGENEITY : PROBLEM AND SOLUTIONS Semantic heterogeneity occurs when the same information is represented by different expressions in various sources (synonyms), or when an expression is used in various sources to represent different information (homonyms). To solve the semantic heterogeneity problem, the use of ontology seems to be essential [3]. Here, we consider ontology as a “formal explicit specification of a shared conceptualization, where conceptualization is a set of concepts, relations, objects and constraints which defines a semantic model of a subject of interest” [4]. In our domain, the data sources may be very heterogeneous and huge. Consequently we need ontology architectures enabling semantic interoperability among information belonging to different sources. Historically, 3 types of ontology are possible [5]: A. Single ontology approach: The use of a global ontology provides a shared vocabulary for the specification of semantics. All the information sources are related to this global ontology. The global ontology can also be a combination of several specialized ontologies. This approach may be a solution to integration problems where all information sources to be integrated provides approximately the same view in a domain. Although easy to implement, this type of ontology is not appropriate to the context of dynamic and autonomous sources. In fact, changes in one information source may affect the global ontology and its mappings to other information sources. B Proceedings of the World Congress on Engineering 2012 Vol I WCE 2012, July 4 - 6, 2012, London, U.K. ISBN: 978-988-19251-3-8 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online) WCE 2012