Modeling the Performance of E-Commerce Sites Jonathan C. Hardwick 1 , Efstathios Papaefstathiou 1 , and David Guimbellot 2 1 Microsoft Research Limited, Cambridge, UK 2 Microsoft Corporation, Redmond WA, USA Indy is a new performance modeling framework for the creation of tools for many different classes of performance problems, including capacity planning, bottleneck analysis, etc. Users can plug in their own workload and hardware models while exploiting core shared services such as resource tracking and evaluation engines. We used Indy to create EMOD, a performance analysis tool for database-backed web sites. We validate EMOD using the predicted and observed performance of SVT, a sample e-commerce site. INTRODUCTION As the software industry moves to supplying services over the internet, the problem of predicting and modeling the performance of these services becomes even more acute. Instead of running on a single computer, services rely on a distributed collection of servers. These can range from a simple two-tier e- commerce site to a geographically distributed collection of portal services. Performance, reliability, management cost, and scalability are all critical to the success of these services [SPE00]. However, their distributed nature makes it difficult to predict or understand the ramifications of changes to the system. Monitoring can reveal the impact of a change, but only after the fact. Furthermore, because of the 24/7 nature of these services, it is important for operations managers and developers to be able to anticipate the performance implications of internal changes (e.g. system topology, software modification) and external factors (e.g. load spikes). What is needed is a range of performance modeling tools that allow software developers, planning staff, and operations managers to ask a wide variety of what-if questions before they apply changes to the service itself. Currently, there are a limited number of modeling tools that can be used for this purpose. This lack of general purpose tools can be attributed to the complex nature of the modeling process. A modeling tool has to choose: • A basic modeling technique, e.g. simulation, statistical, or analytical. • A role for the tool, e.g. capacity planning or performance debugging. • A level of abstraction, which can range from treating servers as black boxes to modeling individual lines of code. • A target audience, which will affect output methods, e.g. system administrators or performance analysts. Existing tools tend to provide a single solution in this multi-dimensional problem space. That is, they choose one combination of the above range of parameters to produce a specialized solution. As a result, there is little sharing of expertise or code between tools, and the tools themselves are not widely adopted. The rest of this paper is organized as follows. First, we propose the idea of modeling infrastructures as a general solution to the problems outlined above, and describe Indy, a particular infrastructure that we have developed. Next we describe how we used Indy to create EMOD, a tool for modeling the performance of e-commerce sites. Then we show how EMOD can be used to model a particular site, and validate its predictions. Finally, we discuss further extensions to the Indy infrastructure and possibilities for future work. MODELING INFRASTRUCTURES As a solution to this problem, we propose the use of a modeling infrastructure [PAP00]. This is not a single tool, because as we have seen above a single tool cannot handle all of the possible modeling requirements. Rather, it is a general-purpose toolkit that can be used to create any number of specialized tools for individual modeling purposes. Furthermore, it is not limited to a fixed set of components – users can contribute new components to extend the toolkit’s capabilities and the range of tools that can be produced using it. We now differentiate between tool developers and end users. A tool developer interacts directly with the modeling infrastructure, choosing from the sets of