978-3-9815370-0-0/DATE13/©2013 EDAA
Hardware-Software Collaborative Complexity Reduction Scheme
for the Emerging HEVC Intra Encoder
Muhammad Usman Karim Khan, Muhammad Shafique, Mateus Grellert, Jörg Henkel
Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), Germany
{muhammad.khan, muhammad.shafique, henkel}@kit.edu
Abstract—High Efficiency Video Coding (HEVC/H.265) is an
emerging standard for video compression that provides almost
double compression efficiency at the cost of major computational
complexity increase as compared to current industry-standard
Advanced Video Coding (AVC/H.264). This work proposes a
collaborative hardware and software scheme for complexity
reduction in an HEVC Intra encoding system, with run-time
adaptivity. Our scheme leverages video content properties which
drive the complexity management layer (software) to generate a
highly probable coding configuration. The intra prediction size
and direction are estimated for the prediction unit which
provides reduced computational-complexity. At the hardware
layer, specialized coprocessors with enhanced reusability are
employed as accelerators. Additionally, depending upon the video
properties, the software layer administers the energy
management of the hardware coprocessors. Experimental results
show that a complexity reduction of up to 60 % and the energy
reduction up to 42 % are achieved.
I. INTRODUCTION AND MOTIVATION
Digital video compression is a fundamental requisite of many
day-to-day applications, like video conferencing, security and
entertainment. Due to the ever increasing trend of video
resolutions (from Full HD 1920×1080 to Quad Full HD
4096×2048 and Ultra HD 7680×4320) and frame rates (30 FPS to
60/120 FPS), the Joint Collaborative Team on Video Coding
(JCT-VC) have recently developed the next generation video
coding standard, called the High Efficiency Video Coding
(HEVC, also termed as H.265) [1]. The goal of HEVC is to
increase the compression efficiency by 50% as compared to that
of the H.264. This coding efficiency is achieved by introducing
additional coding tools and accompanies a tremendous increase in
the computational complexity.
Unlike the H.264’s concept of a Macroblock (MB, 16×16
region of the video frame used as a primary compression unit),
HEVC implements a Quad Tree Coding structure (see Fig. 1),
called the Coding Tree Blocks (CTB). The concept of MBs is
replaced by the Largest Coding Unit (LCU) which can be
recursively divided into 4 Coding Units (CU) of size 2N×2N. The
LCU is subdivided into every possible block partition size (CU
size) and the best combination of CU sizes is selected, by
comparing the Rate-Distortion (RD) cost of one combination to
others (the process is termed as RD Optimization (RDO)). A CU
can be further subdivided into Prediction Units (PU) (of size
2N×2N or N×N) and Transform Units (TU).
Intra-video encoders exploit redundancies of video sequence
only in the spatial domain. These encoders are well-suited to low
latency applications like automotive, and high quality archiving
solutions to remove motion artifacts. For HEVC Intra-encoding, a
PU defines the basic entity for intra prediction, confining itself to
the available many angular directions, DC and planar modes [1].
The PU partition for a CU and the best prediction mode are
collectively called the coding configuration of the CU.
CU
0
CU
1
CU
2
CU
3
64×64
(LCU)
32×32
16×16
8×8
8×8
4×4
Final PU
Decomposition
4×4=N×N
Others=2N×2N
1) Max CU
size
= LCU
size
2) Min CU
size
= 8×8
3) Only 8×8 CU can have 4 4×4 PUs
List of Abbreviations
CTB Coded Tree Blocks
LCU Largest Coding Unit
CU Coding Unit
PU Prediction Unit
Fig. 1: One of the possible CU decomposition in HEVC where a CU is
recursively converted into sub-CUs and PUs
Analysis and Problem: This enormous decision space for
selecting a RDO coding is required for increased compression
efficiency. However, the iterative and recursive behavior of RDO
optimization incurs significant complexity overhead, even for
intra-only encoders, because the RDO decision has to recursively
check each possible PU and intra mode combination. It is note-
worthy that the total number of mode combinations in HEVC is
~42.4× more than that in H.264.
Our experiments in Fig. 2 show that the computational
complexity of the complete Intra-only HEVC has increased by a
factor of ~1.4× for a compression efficiency increase of around
35% as compared to Intra-only H.264. A similar analysis can be
found in [6]. Note, for a Full HD (1920×1080) video, it took
approximately 83 seconds to encode one intra-frame on an Intel
Core-2-Duo processor with 4 GB RAM which illustrates a
significant challenge towards fast HEVC encoders. Therefore, it
is vital to develop complexity reduction algorithms to realize real-
world applications based on the HEVC intra encoders.
The coding complexity illustrates that hardware solutions are
required in embedded video coding systems to fulfill the real-time
encoding demands for HEVC. But a hardware-only solution of
HEVC will have long time-to-market due to the time consuming
full custom design cycle. The development of software-only
solution for HEVC encoding is fast and flexible, but its
throughput is low. Recently, a number of state-of-the-art HEVC
intra encoders have been proposed, e.g. [7]. In [3], the authors
proposed an HEVC Intra prediction HW for only 4×4 blocks. The
work in [4] presents a gradient based fast intra mode decision for
a given PU size. In [5], authors have also presented a fast partition
size selection algorithm for inter-frames exploiting temporal
correlations for frame compression. These methods try to
alleviate pressure off the encoding modules by performing sub-
optimal encoding and using hardware-only solutions, thus
limiting the flexibility of the architectures and resulting in larger
energy, area and memory overhead.
Our Novel Contributions: To satisfy the real-time throughput
constraints of HEVC intra-encoding and to reduce energy