© Springer International Publishing Switzerland 2015 A. Bifet et al. (Eds.): ECML PKDD 2015, Part III, LNAI 9286, pp. 53–67, 2015. DOI: DOI: 10.1007/978-3-319-23461-8_4 Early Detection of Fraud Storms in the Cloud Hani Neuvirth 1() , Yehuda Finkelstein 1 , Amit Hilbuch 1 , Shai Nahum 1 , Daniel Alon 1 , and Elad Yom-Tov 2 1 Azure Cyber-Security Group, Microsoft, Herzelia, Israel {haneuvir,t-yehudf,amithi,snahum,dalon}@microsoft.com 2 Microsoft Research, Herzelia, Israel eladyt@microsoft.com Abstract. Cloud computing resources are sometimes hijacked for fraudulent use. While some fraudulent use manifests as a small-scale resource consump- tion, a more serious type of fraud is that of fraud storms, which are events of large-scale fraudulent use. These events begin when fraudulent users discover new vulnerabilities in the sign up process, which they then exploit in mass. The ability to perform early detection of these storms is a critical component of any cloud-based public computing system. In this work we analyze telemetry data from Microsoft Azure to detect fraud storms and raise early alerts on sudden increases in fraudulent use. The use of machine learning approaches to identify such anomalous events involves two inherent challenges: the scarcity of these events, and at the same time, the high frequency of anomalous events in cloud systems. We compare the performance of a supervised approach to the one achieved by an unsupervised, multivariate anomaly detection framework. We further evaluate the system performance taking into account practical considerations of robustness in the presence of missing values, and minimization of the model’s data collection period. This paper describes the system, as well as the underlying machine learning algorithms applied. A beta version of the system is deployed and used to conti- nuously control fraud levels in Azure. 1 Introduction The adoption of the public cloud as an agile model for computational resources con- sumption is continuously increasing. The high scalability of these services offer many opportunities, as well as new challenges. Examples include failure detection [1, 2], resources optimization [3–5], and security [6, 7]. However, a common challenge to all is the efficient and effective analysis of large quantities of data that is continuously accumulated by such cloud platforms. A significant portion of the data collected at the cloud is in the form of time series data, e.g., signals from the monitoring of continuous resource use. Therefore, machine learning algorithms performing time series analysis and forecasting are commonly ap- plied. Numerous algorithms have been developed for this purpose over the years [8]. The most established ones are auto-regressive models, integrated models and moving average