978-1-4244-7493-6/10/$26.00 ©2010 IEEE ICME 2010 STATISTICAL ANALYSIS AND MODELING OF HIGH DEFINITION VIDEO TRACES Abdel Karim Al Tamimi, Raj Jain and Chakchai So-In Dept. of Computer Science and Engineering, Washington University in St. Louis, USA Email: {aa7, jain, cs5}@cse.wustl.edu ABSTRACT High definition video streams are gaining larger shares of the Internet usage for typical users on daily basis. This is an expected result of the current boom in the online standard and high definition (HD) video streaming services such as YouTube and Hulu. Because of these video streams’ unique statistical characteristics and their high bandwidth requirements, they are considered to be a continuous challenge in both network scheduling and resource allocation fields. In this paper we provide a statistical analysis of over 50 high definition video traces that resembles wide varieties of high definition video traffic workloads. We performed both factor and cluster analysis on our collection of video traces to support a better understanding of video stream workload characteristics and their impact on network traffic. Additionally, we compare and evaluate different modeling approaches for high definition videos traces. Keywords—Workload Characterization, Factor Analysis, Video Clustering, Multimedia Communications, Communication Networks. 1. INTRODUCTION Web based video streaming websites open the doors to promising opportunities to distribute digital video contents to millions of people. Websites like YouTube [1] are now considered to be among the most daily accessed websites for Internet users. Such websites are now accounting for 27 percent of the Internet traffic, rising from 13 percent in one year [3]. This surge in traffic percentage can be explained by considering the latest surveys, as they show that the percentage of U.S. Internet users watching streaming videos have increased from 81% to 84.4%, and the average time spent per month increased from 8.3 to 10.8 hours/month in just three months [4,5]. Additionally, several websites have started to offer access to TV shows and selected movies, e.g. Hulu[2] and Netflix[6], which increased the reliance of the daily Internet users on such websites, and augmented their expectations of the level of services and quality of delivery. All these reasons inspire network researchers to put more emphasis on handling such demanding traffic. Resource allocation and bandwidth control are dependent on their ability to predict and manage the demand of video streams. Therefore, the need of analyzing such challenging traffic, and possibly modeling it, is essential to allow better quality of service (QoS) support. Modeling video streams is a challenging task because of the high variability of the video frame sizes. Such variability has been emphasized with the introduction of high definition codec MPEG4-Part10, also known as advanced video codec (AVC) and H.264. AVC codec provides better performance and compression rate (i.e. lower mean values) than their predecessors. Yet at the same time, they result in higher variability rates in frame sizes[7]. There have been several previous contributions that aimed to achieve a better understanding of the relationship between the behavior of the video traces and their impact on resource allocation. In [8], the authors presented a statistical and factor analysis study of 20 MPEG1 encoded video traces and the impact of such traffic on ATM networks. Similar approaches were presented in [9] with emphasis on video trace frame size distribution. The author in [10] performed a statistical analysis on four MPEG4-AVC encoded video traces with attention to the quantization effects on several statistical quantitative measurements and the correlation between video frames. In [11], the authors fitted one MPEG4-AVC encoded movie encoded with different quantization levels using Gamma density distribution function. In [7], the emphasis of the authors was to show the capabilities of the AVC standard versus its predecessor, viz., MPEG4-Part2. In this paper, we present our work of analyzing and modeling over 50 HD video traces that we have selected from YouTube HD section. We aim through this contribution to investigate the main statistical characteristics that define a HD video trace. Such identifying process is important for two main reasons: it helps in clustering video traces depending on a certain statistical criterion to help choose the correct traffic workload, and for other data mining tasks. Additionally, it helps define the main statistical attributes of video traces that should be considered to achieve a valid statistical model. In our analysis, we also investigate the applicability of several video models in our pursuit for a general and a simple model that does not require significant statistical knowledge. The rest of this paper is organized as follows: in the next section we discuss the methodology of selecting and encoding our collection of HD video traces. Section 3 illustrates the steps taken to perform both factor and cluster analysis on the video traces. Section 4 compares different