Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems Thomas Heinze 1 , Zbigniew Jerzak 1 , Gregor Hackenbroich 1 , Christof Fetzer 2 1 SAP AG 2 TU Dresden, Systems Engineering Group Chemnitzer Str. 48, 01187 Dresden, Germany Noethnitzer Str. 46, 01187 Dresden, Germany {firstname.lastname}@sap.com christof.fetzer@tu-dresden.de ABSTRACT Elastic scaling allows a data stream processing system to react to a dynamically changing query or event workload by automatically scaling in or out. Thereby, both unpredictable load peaks as well as underload situations can be handled. However, each scaling decision comes with a latency penalty due to the required operator movements. Therefore, in practice an elastic system might be able to improve the system utilization, however it is not able to provide latency guarantees defined by a service level agreement (SLA). In this paper we introduce an elastic scaling system, which opti- mizes the utilization under certain latency constraints defined by a SLA. Specifically, we present a model, which estimates the latency spike created by a set of operator movements. We use this model to built a latency-aware elastic operator placement algorithm, which minimizes the number of latency violations. We show that our solu- tion is able to reduce the 90th percentile of the end to end latency by up to 30% and reduce the number of latency violations by 50%. The achieved system utilization for our approach is comparable to a scaling strategy, which does not use latency as optimization target. 1. INTRODUCTION A classical distributed data stream processing system [1, 21] uses a fixed number of processing nodes, which is chosen to meet the expected maximal workload. However, due to the fact that peak loads only occur from time to time, in average the system is mostly underutilized. Ideally, the system should automatically acquire new processing nodes or release existing nodes to match the workload. Such systems are called elastic [15]. Elastic scaling systems are especially beneficial for the end user when running in a public cloud environment with a pay per use model [3]. Various authors studied the problem of designing elastic scaling data stream processing systems. They focused on the underlying problems like an efficient operator state management [9], the co- ordination of large processing clusters for an elastic data stream processing systems [11] or the dynamic selection of the optimal level of operator parallelism [10]. An important challenge for such systems is to fulfill SLA con- straints on the end to end latency [20]. The cause of unexpected latency spikes violating these constraints can be either a system overload or frequent operator movement between different hosts. In an elastic scaling data stream processing system overload situations can be avoided by an online load balancing [11, 10, 9]. A movement of operators between hosts creates a latency spike, because the processing needs to be paused until the operator has been initialized successfully on the target host [11, 9, 19]. Elas- tically scaling a data stream processing system involves frequent movements of operators to maximize the system utilization. In con- sequence, typically an unpredicatable number of latency violations occur. In this paper we present an extension to the elastic scaling sys- tem FUGU [12], which tries to minimize the number of latency violations and at the same time to maximize the utilization values. We introduce a model to estimate the movement cost in terms of end to end latency for a set of operator movements. The presented model is generic, it can be used to estimate the latency spikes of any elastic scaling data stream processing system. We classify opera- tor movements into two categories (1) mandatory and (2) optional movements. All operator movements to avoid an overload of the system are mandatory scaling decisions. The release of a host due to underload is an optional scaling decision. All optional scaling decisions can be postponed or canceled in case the estimated latency spike would be too high. Thereby, unnecessary violations of the latency constraints can be avoided. In our prototype we illustrate a decision kernel, which prevents optional scaling decisions resulting in large latency spikes. We show that the end to end latency spikes can be decreased by up to 30% and the number of latency violations by up to 50%. The average utilization is comparable to a scaling strategy, which does not use latency as optimization target. The system can be configured using different SLA policies and shows in consequence different scaling behavior. To the best of our knowledge, there exist no other elastic scaling data stream processing system which supports different SLA configurations. In the following, first we motivate our work by studying the char- acteristics of an elastic scaling data stream processing engine. As the second step, we introduce the system architecture of FUGU and present the implementation of the used online operator movement solution based on the algorithm presented by FLUX [19]. Based on this technique we present our model for the operator movement costs and its integration into a latency-aware elastic scaling algo- rithm, which decides to scale in or scale out automatically. Finally, we evaluate the system with a real world use case using data taken from the Frankfurt stock exchange. 2. MOTIVATING EXAMPLE The problem studied in this paper is illustrated by the example given in Figure 1. It shows the elastic scaling of a data stream processing engine with a varying query load and a varying data rate for one of the used evaluation scenarios (see Section 7). The elastic data stream processing engine allocates dynamically new hosts depending on the current query and event load. The number of hosts varies between one and five hosts during the peak load. The elastic data stream processing engine improves the system utilization by optimizing the number of hosts used in the system