GM-RF: AN AV1 INTRA-FRAME FAST DECISION BASED ON RANDOM FOREST Pablo Rosa, Daniel Palomino, Marcelo Porto, Luciano Agostini Video Technology Research Group (ViTech), Graduate Program in Computing (PPGC) Federal University of Pelotas Pelotas, Brazil {pablo.rosa, dpalomino, porto, agostini}@inf.ufpel.edu.br ABSTRACT This paper presents the Grouping of Modes based on Random Forest (GM-RF), a fast decision algorithm for the AOMedia Video 1 (AV1) intra-frame prediction applying machine learning (ML). AV1 implements a wide variety of intra-frame prediction tools, significantly increasing the required computational effort. The GM-RF uses trained Random Forest (RF) models to reduce the number of intra-frame prediction modes evaluated for each encoded block. Experimental results show that the GM-RF achieves an average time savings of 50.19%, with a BD-BR of 7.41%. Compared with related works, GM-RF reached time savings from 5.6 to 10 times higher at a cost of a higher BDBR. To the best of the authors’ knowledge, this is the first solution in the literature using ML to reduce the AV1 intra-frame prediction computational effort. Index TermsAV1, Intraframe Prediction, Fast Decision, Machine Learning, Random Forest 1. INTRODUCTION The use of video coding standards such as the High- Efficiency Video Coding (HEVC) [1], defined by international standardization bodies as Motion Picture Expert Group (ISO/IEC MPEG) and Video Coding Experts Group (ITU-T VCEG), involves a high cost due to the royalties applied, burdening companies interested in developing applications with these standards. Several initiatives have been undertaken to develop an alternative, open and free encoder for video delivery for a wide range of industry use cases, such as video on demand, video conferencing, streaming and video game streaming [2]. As several actors in the industry had the same goals and needs in the field of video coding, the Alliance for Open Media (AOMedia) was founded in 2017. AOMedia is a technology industry consortium formed by Google, Cisco, Mozilla, Apple, Netflix, Facebook and many others leading technology companies. AOMedia developed their encoder starting from Google’s VP9, Cisco’s Thor, and Mozilla’s Daala encoders, to develop a new, royalty-free, and efficient encoder. The specification of this encoder was released in March 2018 [3] and it was named as AOMedia Video 1 (AV1) [4]. AV1 follows the state-of-the-art flow of hybrid video encoders, splitting the process into the stages of intra-frame prediction, inter-frame prediction, transforms, quantization, in-loop filters, and entropy coding. However, AV1 introduced many new techniques in each one of these steps when compared to previous encoders. In its initial implementation, the AV1 demonstrated that, under certain conditions, it was superior to HEVC, when the coding efficiency was considered [5]-[6]. However, when compared to HEVC encoding videos of the same bitrate and resolution, the time required for AV1 to encode these videos was twice as long [7]. Many novel techniques have been designed and implemented in AV1, specially on the intra-frame prediction, helping to achieve higher coding efficiency gains. Then, AV1 allowed the use of a lot of intra-frame prediction modes and this variety of available modes brings significant gain in the coding efficiency. However, this also increases the computational effort of this stage, since for each predicted block the AV1 must evaluate all these prediction modes. There are a few works in the literature presenting efficient solutions targeting the AV1 intra-frame prediction, like [8]-[12]. The work [8] focus on the AV1 intra- frame prediction coding efficiency improvement. The works [9] and [10] focused on hardware-based solutions. The works [11] and [12] are focused on computational effort reduction forthe AV1 intra-frame prediction, using heuristics. Then, none of these works explore machine learning based solutions. The works in [13]-[15] proposed machine learning based solutions for the intra-frame prediction but targeting the Versatile Video Coding (VVC) [16] and not AV1. This paper presents the GM-RF, a fast decision algorithm for the AV1 intra-frame prediction applying Random Forest (RF) Machine Learning (ML) models to reduce the encoding time by skipping the evaluation of the less probable modes. The main strategy of this solution paper is to separate the intra modes into groups, so the trained machine learning models determine which is the most probable best group of modes to encode the current block. Then, reducing the number of modes to be evaluated, and consequently, reducing the encoding time and the energy consumption of the intra-frame prediction step. The experimental results show that it was possible to achieve a high time savings from all video sequences explored, with an overall average increase in BDBR of 7% to reduce the encoding time by 50%. 3556 978-1-6654-9620-9/22/$31.00 ©2022 IEEE ICIP 2022 2022 IEEE International Conference on Image Processing (ICIP) | 978-1-6654-9620-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICIP46576.2022.9897488