ONTOLOGY BASED DATA WAREHOUSING FOR IMPROVING TOURISTIC WEB SITES Alberto Salguero, Francisco Araque, Cecilia Delgado Department of Computer Languages and Systems - University of Granada C/ Periodista Daniel Saucedo Aranda s/n, 18071, Granada (Andalucía), Spain ABSTRACT The World Wide Web (WWW) is continuously evolving and its information is dispersed. It is not always easy for a user to find the information he is looking for. By mean of a Data Warehouse approach we will store and integrate some of the interesting tourist information in the WWW. This information will be used to expand the information of the WWW when navigating through web pages using a Firefox plug-in. The Data Warehouse architecture has been designed using an ontology approach so this plug-in is able to perform some kind of reasoning about the relevant information to display. KEYWORDS Data Warehouse, tourism, ontology, e-business, World Wide Web. 1. INTRODUCTION There is an increase in the number of Web sites which can be queried across the WWW. Such data sources typically support HTML forms-based interfaces and search engines query collections of suitably indexed data. One drawback to these data sources is that the information is not well structured and is usually volatile. Structured objects have to be extracted from the HTML documents which contain irrelevant data. One of the main problem of using the Web as data source is that all web sites are developed and managed independently. Every organization is responsible of defining the scheme of its data as well as its representation. This fact implies the need of a software layer which integrates all the data coming from all the web data sources. The ability to integrate data from a wide range of data sources is an important field of research in data engineering. Data integration is a prominent theme in many areas and enables widely distributed, heterogeneous, dynamic collections of information sources to be accessed and handled. The Data Warehouse (DW) approach is usually selected in business environments as the best solution to store and integrate all the information coming from independent data sources. The DW architecture is designed to simplify and enhance the querying and the analysis process of its data. Web information sources usually have their own information delivery schedules (Watanabe et al., 2001). Generally, the enterprises and organizations develop systems that are continuously polling the sources to enable (near) real-time changes capturing and loading. This approach is not efficient and can produce overload problems if it is necessary to query a lot of sources. It is more efficient to poll the web sites when it is needed. In order to address this problem, we propose a system which allows distributed information monitoring of web data sources on the WWW. The approach relies on monitoring information distributed on different resources and alerting the user (in our case the DW refreshments process) when certain conditions regarding this information are satisfied (temporal properties). We are going to apply this DW approach for retrieving and integrating interesting touristic data from Web. Tourism is a prominent area in electronic commerce. However, the growth of the on-line tourism market has not been as fast as previously expected (Davidson & Yu, 2005). As pointed out by Lexhagen (2005), tourism businesses should try to develop more value-added services. The goal is to build up strong customer relationships and loyalties, which may provide continuous buying behavior. Some examples of ICT value-added services that a tourism enterprise can offer are automatic categorization of user travel preferences in order to match them up with travel options (Galindo et al., 2002), search engine interface metaphors for trip planning (Xiang & Fesenmaier, 2005) and semantic brokering systems (Antoniou et al., ISBN: 978-972-8924-66-9 © 2008 IADIS 120