Detailed Comparative Analysis of VP8 and H.264 Yousef O. Sharrab and Nabil J. Sarhan Electrical and Computer Engineering Department & Wayne State Media Research Lab Wayne State University Detroit, MI 48202 Email:{yousef.sharrab, nabil}@wayne.edu Abstract—VP8 has recently been offered by Google as an open video compression format in attempt to compete with the widely used H.264 video compression standard. This paper describes the major differences between VP8 and H.264 and provides detailed comparative evaluations through extensive experiments. We use 29 raw video sequences, offering a wide spectrum of resolutions and content characteristics, with the resolution ranging from 176x144 (QCIF) to 3840x2160 (2160p). To ensure a fair study, we use 3 coding presets in H.264, each with three types of tuning, and 7 presets in VP8. The presets cover a variety of achieved quality or complexity levels. The performance metrics include accuracy of bitrate handling, encoding speed, decoding speed, and perceptual video quality. Keywords-Comparative Analysis, Decoding Speed, Encoding Speed, H.264, Perceptual Video Quality, Video Codecs, Video Compression, VP8. I. I NTRODUCTION The H.264 video compression standard currently enjoys a great support by video websites, applications, and hardware platforms and devices, including TVs, smart phones, and digital cameras. In addition, it has numerous popular imple- mentations, including JM, X264, and FFmpeg. The widespread of H.264 has recently faced a great challenge with Google releasing VP8 as royalty-free open video compression format in an attempt to compete with H.264 and gradually replace its usage on YouTube [1]. The effectiveness of VP8 compared with H.264 will be one of the deciding factors that will deter- mine whether VP8 will be the video compression of choice in the future. With the great overreach of video compression, it is imperative to rigorously evaluate the effectiveness of VP8 and compare it with H.264. Unfortunately, only little work compared the effectiveness of H.264 and VP8 [2], [3]. That work is also highly limited. A commercial report [2] has recently compared the performance of VP8 and H.264. Since most of the used video sequences were previously compressed using other codecs, they cannot be used reliably to draw any conclusions because of the in- evitable bias introduced in re-compression tests. Study [3] has compared VP8 and H.264 in terms of only video quality, using only one metric, only three CIF sequences (with 352x288 resolution), and basic encoding settings. This paper describes the major differences between VP8 and H.264 in features and operation. It also provides detailed comparative evaluations by more than 1, 300 experiments, so as This work was supported in part by U.S. NSF grant CNS-0834537. to reflect real-life situations by carefully choosing the encoding parameters, the video test sequences, and the proper metrics. We use 29 raw video sequences, offering a wide spectrum of resolutions and content characteristics, with the resolution ranging from 176x144 (QCIF) to 3840x2160 (2160p) and the content varying greatly in the level of detail and motion speeds. To ensure a fair study, we use 3 coding presets in H.264, each with three types of metric tuning, and 7 presets in VP8. These presets cover a variety of achieved quality or complexity levels. The bitrate for each sequence is varied in a wide range that is suitable for that sequence. For H.264, we use X264 , which is the best implementation, according to the results in [2]. The performance metrics include perceptual video quality, encoding speed, decoding speed, and accuracy of bitrate handling. For perceptual video quality, we use two metrics: Peak Signal-to-Noise Ratio (PSNR) and the Structural SIMilarity index (SSIM) [4]. The experiments with metric tuning in H.264 are “No Tuning”, “SSIM Tuning”, and “PSNR Tuning”. Although we discuss all the results, the conclusions are based on “No Tuning” as the other two options may be unfair to VP8. The rest of the paper is organized as follows. Section II pro- vides preliminary analysis of H.264 and VP8. Subsequently, Section III discusses the performance evaluation methodology. Finally, Section IV presents and analyzes the main results. II. PRELIMINARY ANALYSIS Video data contains spatial and temporal redundancy. There- fore, similarities can be encoded by just considering differ- ences (residuals) within a frame. The first frame of a sequence or a random access point is typically intra-coded. Each block of pixels in an intra-frame is predicted using previously- encoded neighboring blocks. For all remaining frames of a sequence or between random access points, inter-coding is usually used, employing block motion compensation to predict blocks from other previously decoded frames. The residuals of the intra and inter-prediction are then transformed to the frequency domain using the Integer Discrete Cosine Transform (Integer DCT). Subsequently, the transform coefficients are quantized, thereby reducing the overall precision of the coef- ficients and possibly eliminating high frequency coefficients. The quantized transform coefficients are entropy coded and transmitted together with any possible motion vectors. In the YUV colorspace, each pixel is represented by three components: Y, U, and V. The Y component determines