S. Auer, O. Diaz, and G.A. Papadopoulos (Eds.): ICWE 2011, LNCS 6757, pp. 1–12, 2011.
© Springer-Verlag Berlin Heidelberg 2011
The Anatomy of a Multi-domain Search Infrastructure
Stefano Ceri, Alessandro Bozzon, and Marco Brambilla
Dipartimento di Elettronica e Informazione,
Politecnico di Milano, P.zza Leonardo Da Vinci 32,
20133 Milan, Italy
{name.surname}@polimi.it
Abstract. Current search engines do not support queries that require a complex
combination of information. Problems such as “Which theatre offers an at least-
three-stars action movie in London close to a good Italian restaurant” can only be
solved by asking multiple queries, possibly to different search engines, and then
manually combining results, thereby performing “data integration in the brain.”
While searching the Web is the preferred method for accessing information in
everyday’s practice, users expect that search systems will soon be capable of mas-
tering complex queries. However, combining information requires a drastic
change of perspective: a new generation of search computing systems is needed,
capable of going beyond the capabilities of current search engines. In this paper
we show how search computing should open to modular composition, as many
other kinds of software computations. We first motivate our work by describing
our vision, and then describe how the challenges of multi-domain search are ad-
dressed by a prototype framework, whose internal “anatomy” is disclosed.
Keywords: Web information retrieval, multi-domain query, search computing,
software architecture, modular decomposition.
1 Introduction
Search is the preferred method to access information in today's computing systems.
The Web, accessed through search engines, is universally recognized as the source for
answering users’ information needs. However, offering a link to a Web page does not
cover all information needs. Even simple problems, such as “Which theatre offers an
at least-three-stars action movie in London close to a good Italian restaurant”, can
only be solved by searching the Web multiple times, e.g. by extracting a list of the
recent action movies filtered by ranking, then looking for movie theatres, then looking
for Italian restaurants close to them. While search engines hint to useful information,
the user's brain is the fundamental platform for information integration.
An important trend is the availability of new, specialized data sources – the
so-called “long tail” of the hidden Web of data. Such carefully collected and curated
data sources can be much more valuable than information currently available in Web
pages; however, many sources remain hidden or insulated, in the lack of software
technologies for bringing them to surface and making them usable in the search
context. We believe that in the future a new class of search computing systems will
support the publishing and integration of data sources; the user will be able to select