COLOR PREDICTION IN IMAGE CODING USING STEERED MIXTURE-OF-EXPERTS Ruben Verhack ⋆† , Simon Van De Keer ⋆ , Glenn Van Wallendael ⋆ , Thomas Sikora † , and Peter Lambert ⋆ ⋆ Ghent University - iMinds - Data Science Lab, Ghent, Belgium † Technische Universit¨ at Berlin - Communication Systems Lab, Berlin, Germany ABSTRACT We propose a novel approach for modeling and coding color in images and video. Luminance is linearly correlated with chrominance locally, as such we can predict color given the luma value. Using the Steered Mixture-of-Experts (SMoE) approach, the image is viewed as a stochastic process over 5 random variables including the 2-D pixel locations, 1 lumi- nance and 2 chrominance values. We model this process as a continuous joint density function by ﬁtting a K-modal 5-D Gaussian Mixture Model (GMM). As such, the chroma values are predicted as the expectation of the conditional density. To validate, the technique was integrated within JPEG showing PSNR gains in the lower bitrate regions. A deeper analysis of the tolerance of the activation function is given through recy- cling color models in video sequences, yielding a high quality reconstruction over a considerable range of frames. Index Terms— Image coding, inter-channel predic- tion, color prediction, Gaussian Mixture Model, Mixture- of-Experts 1. INTRODUCTION Over the last few decades, image and video coding have been a very active ﬁeld of research. The abundance of images is ever increasing, given the popularity of social media plat- forms and on-demand services. The proportional growth in computing power has opened the path for exploring compu- tationally heavier techniques aiding in compression [1]. In this work we design a novel approach using modern machine learning methods for color modeling in image and video cod- ing. The main goal being to ﬁnd a luma to chroma predictor that is able to reconstruct the color components efﬁciently. The human vision is much less susceptible to nearby changes in color compared to changes in luminance [2]. In order to treat luma and chroma differently, a transformation The research activities described in this paper were funded by the Data Science Lab (Ghent University - iMinds), Communication Systems Lab (Technische Universit¨ at Berlin), Flanders Innovation & Entrepreneurship (VLAIO), the Fund for Scientiﬁc Research Flanders (FWO Flanders), and the European Union. The computational resources (STEVIN Supercomputer Infrastructure) and services used in this work were kindly provided by Ghent University, the Flemish Supercomputer Center (VSC), the Hercules Founda- tion, and the Flemish Government department EWI. from RGB to the YCbCr color space is often used. This splits the components in a luminance component Y and the chroma components Cb and Cr. Based on the previous observation, chroma components are often subsampled [3] and quantized more coarsely [4]. Although the YCbCr-transform decorre- lates the channels globally, in practice correlation between the luma channel and the chroma channels still exists locally. Research has shown that within small regions of an im- age, linear inter-channel correlation exists between the lumi- nance and chrominance channels [5] [6]. This lead to a pro- posal for an integration within HEVC [7], but was rejected be- cause of non-negligible overhead and the loss of luminance- chrominance plane parallelism. Color modeling is also in- herent to the ﬁeld of colorization, as such our approach was inspired by the work of Cheng et al [8]. All modern and well established compression schemes are based on block-based transform coding and DPCM-like prediction methods. In this work we take a completely dif- ferent approach, which was largely motivated by the re- cently introduced Steered Mixture-of-Experts Regression (SMoE) methodology [9][10], combined with ideas from image colorization [11]. We present a color modeling scheme using Mixture-of-Experts to model a non-linear predictor F (x,Y ) → (C b ,C r ) over the whole image, where Y is the luminance, x is the 2-dimensional pixel location, and C b and C r are the chrominance values. This approach moves away from the usual block based techniques central to many modern compression schemes, e.g. JPEG and JPEG2000 for image and HEVC for video. The underlying stochastic process of the amplitudes are modeled as 5-D (2-D location and 3 color channels YUV) multi-modal Gaussian Mixture Model (GMM). As such a space-continuous internal representation of the image is ob- tained. The GMM models the joint probability density func- tion, which contains all the necessary and sufﬁcient statistics to perform the chroma regression. The decoder then performs the chroma estimation based on this model. Every Gaussian kernel is considered as an expert and all experts collaborate toward the chroma reconstruction given a location and a luma value. Given the softmaxed support of the experts, the model yields a continuous, smoothed piecewise regression function over the whole domain.