A Robust Video Identiﬁcation Framework using Perceptual Image Hashing Francisco Vega * , Jos´ e Medina † , Daniel Mendoza * , V´ ıctor Saquicela * , and Mauricio Espinoza * * Computer Science Department, University of Cuenca, Ecuador {francisco.vegaz, daniel.mendoza, victor.saquicela, mauricio.espinoza}@ucuenca.edu.ec † Department of Electrical, Electronic Engineering and Telecommunications, University of Cuenca, Ecuador jose.medina@ucuenca.edu.ec Abstract—This paper proposes a general framework that allows to identify a video in real time using perceptual image hashing algorithms. In order to evaluate the versatility and performance of the framework, it was coupled for a use case about ads tv monitoring. Four Perceptual Image Hashing (PIH) algorithms were subject to a benchmarking process in order to identify the best one for the use case. This process was focused on analyze differences in terms of discriminability (D), robustness (R), time processing (Tp) and efﬁciency (E). A truth table was used to obtain information about discriminability and robustness, while processing time was directly measured. An efﬁciency metric based on time processing and identiﬁcation capacity was proposed. In general terms, DHASH and PHASH algorithms have higher identiﬁcation capacities than AHASH and WHASH in order to identify a video using only one frame. Moreover, a progressive decrease in robustness with the increment of the Hamming distance is observed in all cases. However, in a speciﬁc case of tv monitoring where speed is critical, the processing time becomes the most discriminatory parameter for the selection of the algorithm. So, for this case, a particular type of PIH (Average Hash) is highlighted as the most efﬁcient one among other techniques, reaching an accuracy of 100% and frame rates on processing average of 108 fps with a Hamming Distance of 1. At the end, the proposed framework has remarkable identiﬁcation skills, and presents an efﬁcient search. Furthermore, presents the steps to select the best algorithm and its more adequate parameters, according to the requirements of each particular case. Index Terms—video identiﬁcation, perceptual image hashing I. I NTRODUCTION During the last years, the proliferation of multimedia content has grown enormously, mainly due to costs savings and ease of use of new tools for generating, producing, storing, distributing, and delivering digital content. However, this has complicated the management of multimedia ﬁles, especially in tasks such as: control of copyright infringements, search for content, or custom ﬁltering. From all types of multimedia objects, videos and particularly the process used in their identiﬁcation, is still one of the major research challenges. A reason is the amount of alterations and variations that can suffer these multimedia objects in their properties, such as resolution, format or codecs. Furthermore, the videos can suffer the adhesion of banners or logos within their original content. In order to deal with these problems, many studies have focused their attention on the development and application of Perceptual Hashing techniques [1]–[4], which allow the identiﬁcation of multimedia contents for obtaining short binary strings of robust characteristics present in the multimedia data [1]. These techniques are also known as ﬁngerprinting or content-based media identiﬁcation. The application of these techniques is much faster and more efﬁcient than performing a direct comparison of the multimedia content, especially in the case of videos, since this type of multimedia objects is composed by a sequence of images (frames) and generally an audio track. Most authors agree that Perceptual Hashing based algo- rithms must ensure the following properties: robustness to support distortions in the content, fast extraction to generate a hash representation from the multimedia content, discrimina- tion to avoid collisions between hash values, fast searching to retrieve an element from a database, and efﬁciency to identify the required item. Considering these criteria, several algorithms have been developed that using a measure of similarity allow to compare the hash representation of a multimedia object with the hash representation of multimedia objects previously extracted and stored in a database. The aim of these algorithms is to establish the “equality” of the compared objects. In a general way, the identiﬁcation of videos based on Perceptual Hashing techniques can be classiﬁed into two main groups: • Based on audio ﬁngerprinting [5]–[7]. This type of algorithm allows identifying a song or audio in general from a video, by extracting particular characteristics (ﬁngerprints) of a fragment of the audio track, whether or not to a noisy environment. • Based on visual characteristics. These algorithms are based on the extraction of the particularities of the multimedia object from its visual characteristics. Several algorithms of this type have been proposed in the lit- erature, using techniques of diverse nature to obtain the required representation. According to [8] these algorithms can be classiﬁed into three groups: 1) Frame-by-frame video hashing [1], [3], [9]. This group applies in an individual way algorithms based 978-1-5386-3057-0/17/$31.00 © 2017 IEEE