Business-Oriented Capacity Planning of IT Infrastructure to Handle Load Surges Filipe Marques, Jacques Sauv´ e, Ant˜ ao Moura Universidade Federal de Campina Grande, Brazil Email: {filipetm, jacques, antao}@dsc.ufcg.edu.br Abstract— This work proposes a business-oriented approach to designing IT infrastructure in an e-commerce context subject to load surges. The main difference between the proposed approach and conventional ones is that it includes the negative business impact – loss – incurred due to IT infrastructure failures and performance degradation. The approach minimizes the sum of infrastructure cost and business losses, rather than only considering infrastructure cost. A complete example scenario shows the value of the method. I. I NTRODUCTION The goal of this work is to present and formalize a new business-oriented IT infrastructure capacity planning approach that considers load surges. Input load surges are frequent, especially during a time period that precedes special dates, such as the end-of-year buying season or during planned sales promotions. Failing to consider such input variations when designing IT infrastructure may lead to response time require- ments that will not be satisfied (during the load surges), leading to business loss caused by customer defections resulting from high response time. Alternatively, it is possible to over-design the infrastructure to meet requirements during the highest expected load surge. This alternative obviously leads to higher- cost infrastructure that will be underutilized under normal load. Few infrastructure design approach consider expected load surges, reference [9] being an exception. In the remainder of this paper, section II describes the ca- pacity planning approach from a conventional cost perspective. The capacity planning problem from a business perspective is formalized in section III while section IV applies the approach to an example scenario. Finally, section V summarizes our approach, offers conclusions and discusses next steps. II. CAPACITY PLANNING USING A COST PERSPECTIVE In this section we formalize the capacity planning problem as it has been conventionally treated using a cost-oriented approach, but including load surges in the model. The ana- lytical model adopted here extends the model in [1] to handle expected load surges. An expected surge in load means a large change in load occurring either during a traditional high sales period – e.g. end-of-year season, Mother’s Day – or during planned sales promotions. A. IT Infrastructure Abstraction To make the model easier to understand, the case of a single IT service S is considered here. Extending the model to multiple services is straightforward. Fig. 1. Load States Service S relies upon the set RC of IT resource classes. Database and web servers are examples of resource classes. A given resource class RC j is made up of a cluster of n j identical IT resources. Of this total, m j resources are in load- balanced mode in order to deal with incoming load and offer acceptable response time, and n j - m j resources are spares running in standby mode to offer better service availability. Additionally, service S is subject to a set G of load surges: γ w is the average load applied during the w th load state and its duration is r w units of time; in the remaining sections, the superscript w will be used to refer to the w th load state; the value w =0 refers to normal load while values of w between 1 and |G| refer to the w th load surge. Consider Figure 1: the load applied to service S can be either in a normal state or in a surge state. The load is assumed to switch from the normal state to the surge state whenever there is an expected event such as a sale or a traditional com- memorative date such as the end-of-year season that greatly increases the sales rate. Analogously, the load is assumed to change back to normal state as soon as the special occasion ends. Note that, in order to make mathematical treatment easier, we assume that transients that occur when average load changes are negligible; this is a reasonable assumption since the duration of this transient period is probably negligible when compared to the time period (PP ) over which the IT infrastructure is planned – typically 1 year. Furthermore, a given IT resource R j RC j is made up of a set P j = {P j,1 ,...,P j,k ,...} of IT components. Operating system software, server hardware and application server middleware are examples of IT components that can be part of a resource. In the model, if one or more of the IT   