Navigation-Driven Evaluation of Virtual Mediated Views Bertram Lud¨ ascher, Yannis Papakonstantinou, and Pavel Velikhov ludaesch@sdsc.edu, {yannis,pvelikho}@cs.ucsd.edu Abstract. The MIX mediator systems incorporates a novel framework for navigation-driven evaluation of virtual mediated views. Its architec- ture allows the on-demand computation of views and query results as the user navigates them. The evaluation scheme minimizes superfluous source access through the use of lazy mediators that translate incoming client navigations on virtual XML views into navigations on lower level mediators or wrapped sources. The proposed demand-driven approach is inevitable for handling up-to-date mediated views of large Web sources or query results. The non-materialization of the query answer is transparent to the client application since clients can navigate the query answer using a subset of the standard DOM API for XML documents. We elaborate on query evaluation in such a framework and show how algebraic plan- s can be implemented as trees of lazy mediators. Finally, we present a new buffering technique that can mediate between the fine granularity of DOM navigations and the coarse granularity of real world sources. This drastically reduces communication overhead and also simplifies wrapper development. An implementation of the system is available on the Web. 1 Introduction and Overview Mediated views integrate information from heterogeneous sources. There are two main paradigms for evaluating queries against integrated views: In the warehous- ing approach, data is collected and integrated in a materialized view prior to the execution of user queries against the view. However, when the user is in- terested in the most recent data available or very large views, then a virtual, demand-driven approach has to be employed. Most notably such requirements are encountered when integrating Web sources. For example, consider a mediator that creates an integrated view, called allbooks, of data on books available from amazon.com and barnesandnoble.com. A warehousing approach is not viable: First, one cannot obtain the complete dataset of the booksellers. Second, the data will have to reflect the ever-changing availability of books. In contrast, in a demand-driven approach, the user query is composed with the view definition of allbooks and corresponding subqueries against the sources are evaluated only then (i.e., at query evaluation time and not a priori). Current mediator systems, even those based on the virtual approach, compute and return the results of the user query completely. Thus, although they do not materialize the integrated view, they materialize the result of the user query. This C. Zaniolo et al. (Eds.): EDBT 2000, LNCS 1777, pp. 150–165, 2000. c Springer-Verlag Berlin Heidelberg 2000