Box-Cox transformation as an alternative method for modeling video-on-demand popularity María Teresa González Aparicio, R. García, Xabiel Garcia Pañeda, D. Melendi, S. Cabrero Computer Science Department University of Oviedo Gijón, Asturias, Spain {maytega, garciaroberto, xabiel, melendi, cabrerosergio}@uniovi.es Abstract— The popularity of multimedia videos related to a wide range of news, which were emitted in three different Spanish local on-line newspapers, has been researched in this paper. The statistic distribution from which the popularity came from is unknown. In fact, throughout the literature, many papers have modeled popularity with different distributions, such as Mandelbrot, Stretched, Zipf-like and so on. In this paper, the Box-Cox transformation has been proposed as a unified approach that would cover all the former distributions. The main advantage is its non-parametric nature and in consequence the model selection might be avoided. Keywords: Box-Cox, Mandelbrot, Stretched, Video-on-demand, Zipf-like. I. INTRODUCTION Nowadays the presence of streaming media on the Internet is becoming more popular, especially in web sites dedicated to news, sports, entertainment, education and even in the business world for marketing purposes. As a result, system designers have to face the new features of streaming media content, such as more computing power, an increase of bandwidth and storage requirements or a long-lived nature in order to supply good Web services [8]. Many technologies have emerged to manage this type of content and to reduce the impact over the different resources, among which could be mentioned multicast/unicast delivery, encoding formats or complex cache replacement policies, some of which are being improved steadily. However, more multimedia workloads have to be analyzed to achieve a well-known user access understanding. In [5][10] an analysis of a video-on-demand service “La Nueva España” was presented, one of the services that is analyzed in this paper. Their studies highlight that content type, subjects, content update policy and even the content success make popularity a very difficult parameter to be modeled. A Zipf-like distribution has been applied in stable periods of time and an average θ was calculated. However, when the conditions of the service change due to the arrival of new content, a new value for θ is needed. An algorithm was defined but a popularity pattern was not established. Indeed, modeling user access is not an easy task, because there are so many variables involved. Accordingly, perhaps it is better to get rid of some of these variables and to start managing a simple service. For instance, the number of different types of contents on offer to the user could be reduced and focused to a specific topic and area. In this paper, we analyze session logs from three news video-on-demand streaming services, namely "La Opinión A Coruña" (www.laopinioncoruna.es), "Faro de Vigo" (www.farodevigo.es) and "La Nueva España" (www.lne.com). Each of them belongs to a different area of Spain. As a result, we believe that our study provides relevant results for the design of news video-on-demand services. Specifically, it is focused on popularity distribution. The rest of the paper is organized as follows. Section II reviews previous work. Section III presents a case study related to three news on-line video-on-demand services from Spain. An analysis of popularity with the three services has been carried out in Section IV. Finally, conclusions and future work are proposed in Section V. II. RELATED WORK The video access pattern has been analyzed in a wide range of media services (Web, file sharing, media broadcast, video- on-demand streaming). One of the first distributions applied to model access pattern was Zipf-like. In [4] a workload of one week was analyzed, with streaming-media sessions from 4,786 clients to 866 servers on the Internet, who accessed 23,738 different streaming-media objects. 78% were accessed only once, 1% were accessed ten or more times, and the 12 most popular objects were accessed more than 100 times each. The popularity distribution was modeled with Zipf-like with θ equal to 0.47. The conclusion was that accesses to streaming- media objects were less concentrated on the popular objects. Moreover, in [3] the behavior of the video access pattern was studied at different time scales (one month, six months and more than one year). Indeed, when the period was below seven months a Zipf-like approximation was possible with θ between 1.4 and 1.6, but not for longer periods. In [7] sixteen workloads have been analyzed with different delivery methods (streaming, pseudo streaming, overlay multicast, P2P, etc), different sizes of media file, lengths of duration (from 5 days to more than two years) and different types of contents. The video access pattern has been fit with Stretched Exponential distribution despite of extraneous traffic, introduction of new content and recommendations [13], or “fetch-at-most-once” [2]. IEEE Globecom 2010 Workshop on Ubiquitous Computing and Networks 978-1-4244-8864-3/10/$26.00 ©2010 IEEE 1798