978-1-4244-7493-6/10/$26.00 ©2010 IEEE ICME 2010
STATISTICAL ANALYSIS AND MODELING OF HIGH DEFINITION VIDEO TRACES
Abdel Karim Al Tamimi, Raj Jain and Chakchai So-In
Dept. of Computer Science and Engineering, Washington University in St. Louis, USA
Email: {aa7, jain, cs5}@cse.wustl.edu
ABSTRACT
High definition video streams are gaining larger shares of
the Internet usage for typical users on daily basis. This is an
expected result of the current boom in the online standard
and high definition (HD) video streaming services such as
YouTube and Hulu. Because of these video streams’ unique
statistical characteristics and their high bandwidth
requirements, they are considered to be a continuous
challenge in both network scheduling and resource
allocation fields. In this paper we provide a statistical
analysis of over 50 high definition video traces that
resembles wide varieties of high definition video traffic
workloads. We performed both factor and cluster analysis
on our collection of video traces to support a better
understanding of video stream workload characteristics and
their impact on network traffic. Additionally, we compare
and evaluate different modeling approaches for high
definition videos traces.
Keywords—Workload Characterization, Factor
Analysis, Video Clustering, Multimedia Communications,
Communication Networks.
1. INTRODUCTION
Web based video streaming websites open the doors to
promising opportunities to distribute digital video contents
to millions of people. Websites like YouTube [1] are now
considered to be among the most daily accessed websites for
Internet users. Such websites are now accounting for 27
percent of the Internet traffic, rising from 13 percent in one
year [3]. This surge in traffic percentage can be explained by
considering the latest surveys, as they show that the
percentage of U.S. Internet users watching streaming videos
have increased from 81% to 84.4%, and the average time
spent per month increased from 8.3 to 10.8 hours/month in
just three months [4,5]. Additionally, several websites have
started to offer access to TV shows and selected movies, e.g.
Hulu[2] and Netflix[6], which increased the reliance of the
daily Internet users on such websites, and augmented their
expectations of the level of services and quality of delivery.
All these reasons inspire network researchers to put more
emphasis on handling such demanding traffic.
Resource allocation and bandwidth control are dependent on
their ability to predict and manage the demand of video
streams. Therefore, the need of analyzing such challenging
traffic, and possibly modeling it, is essential to allow better
quality of service (QoS) support.
Modeling video streams is a challenging task because of the
high variability of the video frame sizes. Such variability has
been emphasized with the introduction of high definition
codec MPEG4-Part10, also known as advanced video codec
(AVC) and H.264. AVC codec provides better performance
and compression rate (i.e. lower mean values) than their
predecessors. Yet at the same time, they result in higher
variability rates in frame sizes[7].
There have been several previous contributions that aimed to
achieve a better understanding of the relationship between
the behavior of the video traces and their impact on resource
allocation. In [8], the authors presented a statistical and
factor analysis study of 20 MPEG1 encoded video traces
and the impact of such traffic on ATM networks. Similar
approaches were presented in [9] with emphasis on video
trace frame size distribution. The author in [10] performed a
statistical analysis on four MPEG4-AVC encoded video
traces with attention to the quantization effects on several
statistical quantitative measurements and the correlation
between video frames. In [11], the authors fitted one
MPEG4-AVC encoded movie encoded with different
quantization levels using Gamma density distribution
function. In [7], the emphasis of the authors was to show the
capabilities of the AVC standard versus its predecessor, viz.,
MPEG4-Part2.
In this paper, we present our work of analyzing and
modeling over 50 HD video traces that we have selected
from YouTube HD section. We aim through this
contribution to investigate the main statistical characteristics
that define a HD video trace. Such identifying process is
important for two main reasons: it helps in clustering video
traces depending on a certain statistical criterion to help
choose the correct traffic workload, and for other data
mining tasks. Additionally, it helps define the main
statistical attributes of video traces that should be considered
to achieve a valid statistical model. In our analysis, we also
investigate the applicability of several video models in our
pursuit for a general and a simple model that does not
require significant statistical knowledge.
The rest of this paper is organized as follows: in the next
section we discuss the methodology of selecting and
encoding our collection of HD video traces. Section 3
illustrates the steps taken to perform both factor and cluster
analysis on the video traces. Section 4 compares different