GM-RF: AN AV1 INTRA-FRAME FAST DECISION BASED ON RANDOM FOREST
Pablo Rosa, Daniel Palomino, Marcelo Porto, Luciano Agostini
Video Technology Research Group (ViTech), Graduate Program in Computing (PPGC)
Federal University of Pelotas – Pelotas, Brazil
{pablo.rosa, dpalomino, porto, agostini}@inf.ufpel.edu.br
ABSTRACT
This paper presents the Grouping of Modes based on Random
Forest (GM-RF), a fast decision algorithm for the AOMedia
Video 1 (AV1) intra-frame prediction applying machine
learning (ML). AV1 implements a wide variety of intra-frame
prediction tools, significantly increasing the required
computational effort. The GM-RF uses trained Random
Forest (RF) models to reduce the number of intra-frame
prediction modes evaluated for each encoded block.
Experimental results show that the GM-RF achieves an
average time savings of 50.19%, with a BD-BR of 7.41%.
Compared with related works, GM-RF reached time savings
from 5.6 to 10 times higher at a cost of a higher BDBR. To
the best of the authors’ knowledge, this is the first solution in
the literature using ML to reduce the AV1 intra-frame
prediction computational effort.
Index Terms— AV1, Intraframe Prediction, Fast
Decision, Machine Learning, Random Forest
1. INTRODUCTION
The use of video coding standards such as the High-
Efficiency Video Coding (HEVC) [1], defined by
international standardization bodies as Motion Picture Expert
Group (ISO/IEC MPEG) and Video Coding Experts Group
(ITU-T VCEG), involves a high cost due to the royalties
applied, burdening companies interested in developing
applications with these standards. Several initiatives have
been undertaken to develop an alternative, open and free
encoder for video delivery for a wide range of industry use
cases, such as video on demand, video conferencing,
streaming and video game streaming [2]. As several actors in
the industry had the same goals and needs in the field of video
coding, the Alliance for Open Media (AOMedia) was
founded in 2017. AOMedia is a technology industry
consortium formed by Google, Cisco, Mozilla, Apple,
Netflix, Facebook and many others leading technology
companies. AOMedia developed their encoder starting from
Google’s VP9, Cisco’s Thor, and Mozilla’s Daala encoders,
to develop a new, royalty-free, and efficient encoder. The
specification of this encoder was released in March 2018 [3]
and it was named as AOMedia Video 1 (AV1) [4].
AV1 follows the state-of-the-art flow of hybrid video
encoders, splitting the process into the stages of intra-frame
prediction, inter-frame prediction, transforms, quantization,
in-loop filters, and entropy coding. However, AV1
introduced many new techniques in each one of these steps
when compared to previous encoders.
In its initial implementation, the AV1 demonstrated that,
under certain conditions, it was superior to HEVC, when the
coding efficiency was considered [5]-[6]. However, when
compared to HEVC encoding videos of the same bitrate and
resolution, the time required for AV1 to encode these videos
was twice as long [7].
Many novel techniques have been designed and
implemented in AV1, specially on the intra-frame prediction,
helping to achieve higher coding efficiency gains. Then, AV1
allowed the use of a lot of intra-frame prediction modes and
this variety of available modes brings significant gain in the
coding efficiency. However, this also increases the
computational effort of this stage, since for each predicted
block the AV1 must evaluate all these prediction modes.
There are a few works in the literature presenting
efficient solutions targeting the AV1 intra-frame prediction,
like [8]-[12]. The work [8] focus on the AV1 intra- frame
prediction coding efficiency improvement. The works [9] and
[10] focused on hardware-based solutions. The works [11]
and [12] are focused on computational effort reduction forthe
AV1 intra-frame prediction, using heuristics. Then, none of
these works explore machine learning based solutions. The
works in [13]-[15] proposed machine learning based
solutions for the intra-frame prediction but targeting the
Versatile Video Coding (VVC) [16] and not AV1.
This paper presents the GM-RF, a fast decision algorithm
for the AV1 intra-frame prediction applying Random Forest
(RF) Machine Learning (ML) models to reduce the encoding
time by skipping the evaluation of the less probable modes.
The main strategy of this solution paper is to separate the intra
modes into groups, so the trained machine learning models
determine which is the most probable best group of modes to
encode the current block. Then, reducing the number of
modes to be evaluated, and consequently, reducing the
encoding time and the energy consumption of the intra-frame
prediction step. The experimental results show that it was
possible to achieve a high time savings from all video
sequences explored, with an overall average increase in
BDBR of 7% to reduce the encoding time by 50%.
3556 978-1-6654-9620-9/22/$31.00 ©2022 IEEE ICIP 2022
2022 IEEE International Conference on Image Processing (ICIP) | 978-1-6654-9620-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICIP46576.2022.9897488