ONTOLOGY BASED DATA WAREHOUSING FOR
IMPROVING TOURISTIC WEB SITES
Alberto Salguero, Francisco Araque, Cecilia Delgado
Department of Computer Languages and Systems - University of Granada
C/ Periodista Daniel Saucedo Aranda s/n, 18071, Granada (Andalucía), Spain
ABSTRACT
The World Wide Web (WWW) is continuously evolving and its information is dispersed. It is not always easy for a user
to find the information he is looking for. By mean of a Data Warehouse approach we will store and integrate some of the
interesting tourist information in the WWW. This information will be used to expand the information of the WWW when
navigating through web pages using a Firefox plug-in. The Data Warehouse architecture has been designed using an
ontology approach so this plug-in is able to perform some kind of reasoning about the relevant information to display.
KEYWORDS
Data Warehouse, tourism, ontology, e-business, World Wide Web.
1. INTRODUCTION
There is an increase in the number of Web sites which can be queried across the WWW. Such data sources
typically support HTML forms-based interfaces and search engines query collections of suitably indexed
data. One drawback to these data sources is that the information is not well structured and is usually volatile.
Structured objects have to be extracted from the HTML documents which contain irrelevant data.
One of the main problem of using the Web as data source is that all web sites are developed and managed
independently. Every organization is responsible of defining the scheme of its data as well as its
representation. This fact implies the need of a software layer which integrates all the data coming from all the
web data sources. The ability to integrate data from a wide range of data sources is an important field of
research in data engineering. Data integration is a prominent theme in many areas and enables widely
distributed, heterogeneous, dynamic collections of information sources to be accessed and handled.
The Data Warehouse (DW) approach is usually selected in business environments as the best solution to
store and integrate all the information coming from independent data sources. The DW architecture is
designed to simplify and enhance the querying and the analysis process of its data. Web information sources
usually have their own information delivery schedules (Watanabe et al., 2001). Generally, the enterprises and
organizations develop systems that are continuously polling the sources to enable (near) real-time changes
capturing and loading. This approach is not efficient and can produce overload problems if it is necessary to
query a lot of sources. It is more efficient to poll the web sites when it is needed. In order to address this
problem, we propose a system which allows distributed information monitoring of web data sources on the
WWW. The approach relies on monitoring information distributed on different resources and alerting the
user (in our case the DW refreshments process) when certain conditions regarding this information are
satisfied (temporal properties).
We are going to apply this DW approach for retrieving and integrating interesting touristic data from
Web. Tourism is a prominent area in electronic commerce. However, the growth of the on-line tourism
market has not been as fast as previously expected (Davidson & Yu, 2005). As pointed out by Lexhagen
(2005), tourism businesses should try to develop more value-added services. The goal is to build up strong
customer relationships and loyalties, which may provide continuous buying behavior. Some examples of
ICT value-added services that a tourism enterprise can offer are automatic categorization of user travel
preferences in order to match them up with travel options (Galindo et al., 2002), search engine interface
metaphors for trip planning (Xiang & Fesenmaier, 2005) and semantic brokering systems (Antoniou et al.,
ISBN: 978-972-8924-66-9 © 2008 IADIS
120