S. Auer, O. Diaz, and G.A. Papadopoulos (Eds.): ICWE 2011, LNCS 6757, pp. 1–12, 2011. © Springer-Verlag Berlin Heidelberg 2011 The Anatomy of a Multi-domain Search Infrastructure Stefano Ceri, Alessandro Bozzon, and Marco Brambilla Dipartimento di Elettronica e Informazione, Politecnico di Milano, P.zza Leonardo Da Vinci 32, 20133 Milan, Italy {name.surname}@polimi.it Abstract. Current search engines do not support queries that require a complex combination of information. Problems such as “Which theatre offers an at least- three-stars action movie in London close to a good Italian restaurant” can only be solved by asking multiple queries, possibly to different search engines, and then manually combining results, thereby performing “data integration in the brain.” While searching the Web is the preferred method for accessing information in everyday’s practice, users expect that search systems will soon be capable of mas- tering complex queries. However, combining information requires a drastic change of perspective: a new generation of search computing systems is needed, capable of going beyond the capabilities of current search engines. In this paper we show how search computing should open to modular composition, as many other kinds of software computations. We first motivate our work by describing our vision, and then describe how the challenges of multi-domain search are ad- dressed by a prototype framework, whose internal “anatomy” is disclosed. Keywords: Web information retrieval, multi-domain query, search computing, software architecture, modular decomposition. 1 Introduction Search is the preferred method to access information in today's computing systems. The Web, accessed through search engines, is universally recognized as the source for answering users’ information needs. However, offering a link to a Web page does not cover all information needs. Even simple problems, such as “Which theatre offers an at least-three-stars action movie in London close to a good Italian restaurant”, can only be solved by searching the Web multiple times, e.g. by extracting a list of the recent action movies filtered by ranking, then looking for movie theatres, then looking for Italian restaurants close to them. While search engines hint to useful information, the user's brain is the fundamental platform for information integration. An important trend is the availability of new, specialized data sources – the so-called “long tail” of the hidden Web of data. Such carefully collected and curated data sources can be much more valuable than information currently available in Web pages; however, many sources remain hidden or insulated, in the lack of software technologies for bringing them to surface and making them usable in the search context. We believe that in the future a new class of search computing systems will support the publishing and integration of data sources; the user will be able to select