Query Execution Algorithm in Web Environment with Limited Availability of Statistics Juliusz Jezierski and Tadeusz Morzy Poznan University of Technology Piotrowo 3a, 60-965 Poznan, Poland {jjezierski, tmorzy}@cs.put.poznan.pl Abstract. Traditional static cost-based query optimization approach uses data statistics to evaluate costs of potential query execution plans for a given query. Unfortunately, this approach cannot be directly applied to Web environment due to limited availability of statistics and unpre- dictable delays in access to data sources. To cope with lack or limited availability of statistics we propose a novel competitive query execution strategy. The basic idea is to initiate simultaneously several equivalent query execution plans and measure dynamically their progress. Process- ing of the most promising plan is continued, whereas processing of re- maining plans is stopped. We also present in the paper results of perfor- mance evaluation of the proposed strategy. 1 Introduction There is increasing interest in query optimization and execution strategies for Web environment that can cope with two specific properties of this environment: lack or limited availability of data statistics and unpredictable delays in access to data sources. Typically, in Web environment query processing parameters may change significantly over time or they may be simply not available to query en- gines. Web sites that disseminate data in Web environment in the form of files, dynamically generated documents and data streams usually do not allow access to internal data statistics. The second specific property of Web environment is unexpected delay phenomenon in access to external data sources. Such delays may cause significant increase of system response time. They appear due to vari- able load of network devices resulting from a varying activity of users, and also, due to breakdowns. As a result, traditional static optimization and execution techniques cannot be directly applied to Web environment. In the paper, we present the novel competition strategy of query execution in Web environment that solves or reduces limitations of previous solutions (e.g. [1,2,3,4]). Our approach consists in simultaneous execution of a set of alternative query execution plans for a given query. The system monitors execution of these plans, and the most attractive plans are promoted, while execution of the most expensive plans is canceled. Final query result is delivered to the user by the M. Bubak et al. (Eds.): ICCS 2004, LNCS 3036, pp. 532–536, 2004. c Springer-Verlag Berlin Heidelberg 2004