Demonstrating a Versatile Model for VoD Buzz Workload in a Large Scale Distributed Network Jean-Baptiste Delavoix, Shubhabrata Roy, Thomas Begin and Paulo Gonc ¸alves Inria, ENS Lyon, UCB Lyon 1, CNRS, UMR 5668 46 All´ ee d’Italie, 69364 Lyon Cedex 07, France Email: jean-baptiste.delavoix@grenoble-inp.org, {shubhabrata.roy, thomas.begin, paulo.goncalves}@ens-lyon.fr Abstract—In previous works, we proposed a stochastic model able to reproduce buzz dynamics in a Video on Demand (VoD) workload. We also derived an estimation procedure to calibrate all the model’s parameter and evaluated the performance of our estimator on synthetic time series. We showed how can this procedure be applied to fit real workload traces. In this work we demonstrate the model on Grid’5000 with an aim of conducting real-life experiments. Grid’5000 is a highly reconfig- urable, controllable and monitorable experimental platform for conducting experiments on large scale parallel and distributed systems. Our results show that the implemented model matches the theoretical model in terms of the mean value and the steady state distribution. We believe this demonstration, by emulating a real world VoD system, can provide data that can serve as an input to frame ecient resource management policies. Index Terms—Workload Generator, Video on Demand, Dis- tributed Network, Grid’5000 I. Introduction and Motivation In recent trend of data-intensive applications the providers must handle the challenge of resource management to adapt to volatile workloads. Thus there is a need for realistic workload generators to evaluate the choice of policies prior to full production deployment. In our work we consider a Video on Demand (VoD) systems as a relevant use case of a data- intensive application where bandwidth usage varies rapidly over time. In [1] we propose in a constructive manner, a stochastic model based workload generator for a Video on Demand (VoD) system, that reproduces workload and trac volatility. We also developed methods to empirically identify and calibrate parameters to assess the goodness of fit of the model against a real VoD workload test [2]. In this work we implement our model in a large scale distributed network with a controlled environment to emulate real world system. This work constitutes an indispensable step towards our ultimate objective of leveraging the statistical properties of the model and data collected from the experiments to frame resource management policies. A VoD service delivers video contents to consumers on request. According to Internet usage trends, users are in- creasingly getting more involved in the VoD and this en- thusiasm is likely to grow. According to 2010 statistics a popular VoD provider like Netflix accounts for around 30 percent of the peak downstream trac in the North America and is the “largest source of Internet trac overall” [3]. Since VoD has stringent streaming rate requirements, each VoD provider needs to reserve a sucient amount of server outgoing bandwidth to sustain continuous media delivery. We would like to point out that we are not considering IP multicast here. However, resource allocation often fails to accommodate adequate resources during “buzz” periods when a video becomes popular very quickly leading to a flood of user requests on the VoD servers. Figure 1 shows a typical pattern of real VoD server workload trace from [4] picturing the buzz dynamics. 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 Time(hrs) Number of current viewers Fig. 1. Real workload time series corresponding to a VoD server demand from [4]. Following epidemic models, we categorize VoD users in three dierent classes (states). Class S refers to the people who has not watched a video (susceptible viewers), I refers to the people who are currently watching the video and can spread the information about it. I is the workload on the system, but it can also refer to total bandwidth requested at that moment. The class R refers to the past viewers. Posing (I (t) = i, R(t) = r) as the current state, Figure 2 depicts the model and the transitions between the states. Here β> 0 is the rate of information dissemination per unit of time, l > 0 fixes the rate of spontaneous viewers, γ 1 is the mean watch time of a video. μ 1 denotes the mean active period after which an user stops propagating information. We resort to the hidden Markov models to illustrate a buzz like event by considering that β can assume two values depending on its state; β = β 1 in the normal state and β = β 2 β 1 in buzz regime. Transition