1 Content-based Dynamic Threshold Method for Real Time Keyframe Selecting Pau Usach, Jorge Sastre, Member, IEEE, Valery Naranjo, Luis Vergara and J.M. L´ opez Abstract—This paper presents a new content-based method for real time keyframe selection in H.264 low delay video coding. Based on dynamic thresholds, it is aimed at improving compression efficiency. It has been trained and tested with real sequences taken from movies and commercials, with very different motion and complexity characteristics, and from low to high bitrates, number of frames per second and formats, with a total of 70000 frames and more than 1500 candidate keyframes. Results are presented achieving up to 2.32 dB average PSNR improvement, higher than recent methods in the literature, and even processing time gains, preserving the real time and low delay conditions in real time coders. Index Terms—Content-based, H.264 video coding, keyframe selection, low delay, real time. I. I NTRODUCTION D IGITAL video coding is a challenging technology that has evolved exponentially in the last decades. Due to the potential infinite complexity and variability of the video material, coding optimization techniques are in constant evo- lution. In particular, real-time coding techniques to improve both objective and subjective coded video quality represent a field of maximum interest. On the one hand, current digital video coding schemes exploit both temporal and spatial correlation in the video signal to improve coding efficiency; on the other hand, those codecs also take advantage of the human visual system characteristics to encode and represent the video material. Finally, another way to improve the efficiency of these algorithms is to exploit the intrinsic properties of the video sequences derived from their content and from the way they are produced: the video content tends to be continuous, which favors the use of motion compensation techniques to exploit the temporal correlation, except for situations such as abrupt shot changes, occlusions, fast movements, etc., where the correlation between consec- utive frames is very low. To improve the overall quality of the encoded sequence, the temporal correlation of the video content and its discontinuities among the shots can be exploited. P. Usach, J. Sastre and L. Vergara are with the Instituto de Telecomu- nicaciones y Aplicaciones Multimedia (iTEAM), Universidad Polit´ ecnica de Valencia, Spain. e-mail: [pauusmo, jorsasma, lvergara]@iteam.upv.es V. Naranjo is with Instituto Interuniversitario de Investigaci´ on en Bioinge- nier´ ıa y Tecnolog´ ıa Orientada al Ser Humano (i3bh), Universidad Polit´ ecnica de Valencia, Spain. J.M. L´ opez is with Telef´ onica I+D, Madrid, Spain. This work has been supported by the Universidad Polit´ ecnica de Valencia ”Programas de apoyo a la investigaci´ on y desarrollo” PAID-06-06 and PAID- 04-07. Copyright (c) 2010 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org. From the point of view of the digital video encoder there are two main types of encoded frames: inter-frames (P and B) which use motion estimation and prediction techniques; and intra-frames (I) which do not use these prediction tools and are used as synchronization, recovery or random access points. Traditional video encoders use a fixed scheme to periodi- cally introduce intra-frames, leading to fixed IB...BPB...BP...PI schemes. However, it is reasonable to think that the best I-frames to encode a sequence are the frames that break the continuity of the shots, i.e. frames with low temporal correlation inside a shot. There exist several different methods in the literature to tackle the problem of temporal video segmentation and many of them are described in the reviews [1]-[7]. On the other hand, some keyframe decision algorithms have also been published recently [8]-[14], and those methods use different concepts and measures to reach their goals. Pye et al. [14] propose a method based on the the difference of the color histogram of frames in a window of 32 frames; Lan [11] proposes a high complexity video-content-analysis technique, while [12] takes into account the frame to frame evolution of the mean square of the module of the motion vectors in a quasi single-pass solution. Finally, Sastre et al. [8] use the Sum of Absolute Differences (SAD) and other statistical properties to introduce a low complexity shot change detection method with keyframe placement. In this paper, we propose a new content-based keyframe decision technique which uses the temporal correlation among frames to select the best position for I-frames in real time. While most of the previously mentioned methods are focused on scene analysis and off-line video indexing applications, the aim of the solution proposed here is to improve video coding efficiency and quality of experience in real time, on- line environments. Therefore, our method takes advantage of the information generated by the video encoder in order to perform the keyframe decision during the encoding process in a real time environment. This approach joins the research fields on temporal segmentation of video data, coding optimization and keyframe decision and it is suitable for a wide spectrum of applications, such as direct video streaming, transcoding, video indexing and editing, old film restoration, etc. With this novel approach, we prove that the proper selection of key frames can lead to high PSNR gains which imply both an objective and a subjective quality improvement. Furthermore, a global processing time gain is also achieved when our algorithm is used due to the optimal keyframe selection. We have implemented our algorithm in an open source H.264 encoder [19], as it is the up-to-date edge technology