Performance Evaluation of Cloud Systems: A Behavioural Approach Dimitrios Kallergis, John Tsantilis, Christos Douligeris Dept. of Informatics University of Piraeus 80, Karaoli Dimitriou Street, Piraeus, 18534, Greece {D.Kallergis, mpsp13114, cdoulig}@unipi.gr Abstract—The concept of Cloud Engineering (CE) as a superset of performance engineering emerges in an extensive number of industrial architectural approaches and implementation flavours. To effectively manage cloud-based systems, it is crucial to monitor, to meter and then to allocate their structural behaviour and performance. In the context of this work, considerable number of experimental measurements are conducted in a distributed virtualised infrastructure and statistically analysed. The paper aims to evaluate cloud systems using multiple open-source benchmarking tools. Also, it addresses and discusses operational deviations of a commercial cloud system in respect to the service level objects (SLOs) per se. Index Terms—cloud engineering; performance; benchmarking; statistical I. INTRODUCTION The cloud computing paradigm was originally considered as an optimisation-centric view of software engineering [1]. Thereby, the Cloud Engineering (CE) concept mainly consists of disciplines and values inherited from software engineers back in the late 70’s. From the Grid instant to the Cloud era, multiple issues with regard to hardware consolidation, network-traffic shaping and system performance address a simple challenge; in what terms a service that relies on a distributed architecture can be trustworthy and thus cost- effective for its producers and consumers? The service level agreement (SLA) which is a part of the contract between the cloud service operator or provider and the cloud customer, can form the basis for any attempt to answer the previous question. A SLA consists of commonly agreed measurable features; the service level objects (SLOs). Currently, there exist several efforts on standardising the SLAs; on-going work at international organisations [2] as well as in European Commission Working Groups [3] that define rules and codes of conduct within the business-to-customer (B2C) relationship. In particular, Europe has recently structured a guideline [4] which fundamentally focuses on the set of SLOs that relate to the cloud services. Service level objects are quantitative magnitudes that need to be provided by the cloud service provider and they are used to set the boundaries and margins of errors which apply to the behavioural structure of the cloud service itself. Therefore, we extend our primary question to the following; is an agreement aberration relevant to an infraction by the service provider or inherent to the structural elements of the service-oriented architecture and of the cloud infrastructure operational principles in particular? The later highlights the motivation for on duty performance engagement not only in calculus terms but in finance expressions as well [5]. The monitoring and the benchmarking of service-oriented infrastructures (SOI) defines a research field in which the independence of running applications, the service response intensiveness and the availability are the outstanding factors which lead to the on- demand, scalable and elastic nature of a cloud system [6]. In this paper, we conduct considerable number of experimental measurements in a cloud-based system. Our aim is to figure various performance schemes that relate to specific virtual infrastructure specifications. Our objective is to benchmark explicit virtual hardware components and to consequently illustrate the system’s behavioural structure. To achieve this goal, we use various open-source benchmarking tools for collecting the measured data using thirty-five research teams in an iterative process that came up to several thousands of result counts. Moreover, we analyse these data using typical statistical methods and also discuss the systems’ response in accordance with the initial service agreement between the counterparts. This paper is organised as follows: Section II presents the related work, section III describes the benchmarking methodology followed, while section IV depicts the virtual infrastructure apparatus. Section V illustrates the results and discusses the measured data in terms of their statistical conduct. Section VI concludes this paper. II. RELATED WORK Testing-as-a-Service is a new model that describes services’ performance testing, enterprise resource planning (ERP) software testing and monitoring or evaluating cloud- based applications. S. Carter and S. Isaacson in [7] introduce a technique for evaluating Clouds by combining attributes and usage metrics from different service environments located in various geographical locations. Several studies [8] [9] [10]