A SALIENCY-BASED RATE CONTROL FOR PEOPLE DETECTION IN VIDEO Simone Milani, Riccardo Bernardini, Roberto Rinaldo * DIEGM - University of Udine Via Delle Scienze, 208 - 33100 Udine - Italy e-mail: simone.milani@uniud.it, bernardini@uniud.it, rinaldo@uniud.it ABSTRACT Most of latest-generation multimedia systems are equipped with increasingly-effective object detection algorithms (e.g., intelligent video surveillance systems, augmented reality applications, sharing platforms for multimedia data, etc.). Unfortunately, images and video are usually available in compressed formats, which makes object detection more difficult because of the additional distortion noise. In this paper we show that it is possible to mitigate this problem by introducing a rate allocation algorithm that preserves important details for object identification algorithms. We propose a saliency map that identifies crucial elements for detectors. Then, we map saliency values to the value of the quantization parameter to be used by the video coder. Experimental results on HEVC coder show that the proposed rate control algorithm improves the accuracy with respect to the standard strategy. Index Terms— denoising, object detection, adaptive filtering, saliency map, HOG 1. INTRODUCTION Nowadays, most video systems employ Artificial Intelligence (AI) solutions to achieve a more precise understanding of the acquired scene. Object detection strategies allow these systems to go beyond the mere appearance of pixels, connecting them with the reality in itself that lies behind each image/video. As a matter of fact, algo- rithms for the recognition of objects and scenes play a crucial role in several applications like video surveillance systems, augmented real- ity applications, multimedia sharing platforms (where they are used to enhance classification and retrieval), and many more. Unfortu- nately, most of the multimedia contents processed by these software modules are available in compressed formats, which permits cod- ing an image or a video with a limited amount of bits at the price of an additional distortion. Data compression is still a need since transmission channels and storage facilities have limited capacities. The additional coding noise reduces the accuracy and the precision of object detection algorithms since it alters the features employed in the classification. Most of the presented algorithms rely on the the statistics of the orientation of edges (usually characterized by Histograms of Oriented Gradients or HOGs) and color histograms [1, 2]. Compression standards typically result in a low-pass transfor- mation which spatially smoothes the original image/video samples and modifies the color information. Moreover, at high compression rates, blocking and ringing artifacts introduce some artificial edge information that is not related to the real scene recorded by the de- vice. * This work was partially supported by the POR FESR 2007 – 2013, Friuli Venezia Giulia Regional Project “Barcotica.” (a) (b) (c) (d) Fig. 1. Performance of object detection with different coding noise levels. The adopted query model is persons. The compression ratios are a) 93 % b) 95 % c) 98 % d) 99 %. Fig. 1 shows the effects of compression noise on the object de- tection algorithm described in [1] for different compression ratios. It is possible to notice that the higher the compression level (i.e., the coarser quantization operated on the signal) the lower the pre- cision of the algorithm. More precisely, as the quality decreases, the percentage of correct hits decreases since coding artifacts and distortion lead to false hits and misses. Moreover, the object local- ization becomes coarser preventing a precise selection of the region where the object is present (see Fig. 1.b and Fig. 1.d). As a matter of fact, compression proves to be a delicate task that needs to take into account the effects of the choice of coding parameters on the final performance of the object detection algorithm. Previous works target the problem of coping with noise in object detection by pre- processing the input signal (image or video) with denoising filters. A denoising method for biological macromolecule detection has been proposed in [3] where a set of rotation-equivariant nonlinear filters is employed to denoise contours and perform a rapid object detection in microscopical images. The approach proposed in [4] adopts a noise reduction strategy for pavement images based on wavelet packets. In [5], the authors adopt an adaptive strategy that changes the low-pass behavior of the filter according to the characteristics of the image. However, most of the proposed solutions focus on acquisition noise depending on the capturing conditions and device characteristics, while little work can be found targeting compression. It is possi-