A Robust Video Identification Framework using
Perceptual Image Hashing
Francisco Vega
*
, Jos´ e Medina
†
, Daniel Mendoza
*
, V´ ıctor Saquicela
*
, and Mauricio Espinoza
*
*
Computer Science Department, University of Cuenca, Ecuador
{francisco.vegaz, daniel.mendoza, victor.saquicela, mauricio.espinoza}@ucuenca.edu.ec
†
Department of Electrical, Electronic Engineering and Telecommunications, University of Cuenca, Ecuador
jose.medina@ucuenca.edu.ec
Abstract—This paper proposes a general framework that
allows to identify a video in real time using perceptual image
hashing algorithms. In order to evaluate the versatility and
performance of the framework, it was coupled for a use case
about ads tv monitoring. Four Perceptual Image Hashing (PIH)
algorithms were subject to a benchmarking process in order to
identify the best one for the use case. This process was focused on
analyze differences in terms of discriminability (D), robustness
(R), time processing (Tp) and efficiency (E). A truth table was
used to obtain information about discriminability and robustness,
while processing time was directly measured. An efficiency
metric based on time processing and identification capacity was
proposed. In general terms, DHASH and PHASH algorithms
have higher identification capacities than AHASH and WHASH
in order to identify a video using only one frame. Moreover,
a progressive decrease in robustness with the increment of the
Hamming distance is observed in all cases. However, in a specific
case of tv monitoring where speed is critical, the processing time
becomes the most discriminatory parameter for the selection of
the algorithm. So, for this case, a particular type of PIH (Average
Hash) is highlighted as the most efficient one among other
techniques, reaching an accuracy of 100% and frame rates on
processing average of 108 fps with a Hamming Distance of 1. At
the end, the proposed framework has remarkable identification
skills, and presents an efficient search. Furthermore, presents
the steps to select the best algorithm and its more adequate
parameters, according to the requirements of each particular
case.
Index Terms—video identification, perceptual image hashing
I. I NTRODUCTION
During the last years, the proliferation of multimedia content
has grown enormously, mainly due to costs savings and
ease of use of new tools for generating, producing, storing,
distributing, and delivering digital content. However, this has
complicated the management of multimedia files, especially
in tasks such as: control of copyright infringements, search
for content, or custom filtering. From all types of multimedia
objects, videos and particularly the process used in their
identification, is still one of the major research challenges.
A reason is the amount of alterations and variations that
can suffer these multimedia objects in their properties, such
as resolution, format or codecs. Furthermore, the videos can
suffer the adhesion of banners or logos within their original
content.
In order to deal with these problems, many studies have
focused their attention on the development and application
of Perceptual Hashing techniques [1]–[4], which allow the
identification of multimedia contents for obtaining short binary
strings of robust characteristics present in the multimedia
data [1]. These techniques are also known as fingerprinting or
content-based media identification. The application of these
techniques is much faster and more efficient than performing
a direct comparison of the multimedia content, especially in
the case of videos, since this type of multimedia objects is
composed by a sequence of images (frames) and generally an
audio track.
Most authors agree that Perceptual Hashing based algo-
rithms must ensure the following properties: robustness to
support distortions in the content, fast extraction to generate a
hash representation from the multimedia content, discrimina-
tion to avoid collisions between hash values, fast searching
to retrieve an element from a database, and efficiency to
identify the required item. Considering these criteria, several
algorithms have been developed that using a measure of
similarity allow to compare the hash representation of a
multimedia object with the hash representation of multimedia
objects previously extracted and stored in a database. The
aim of these algorithms is to establish the “equality” of the
compared objects.
In a general way, the identification of videos based on
Perceptual Hashing techniques can be classified into two main
groups:
• Based on audio fingerprinting [5]–[7]. This type of
algorithm allows identifying a song or audio in general
from a video, by extracting particular characteristics
(fingerprints) of a fragment of the audio track, whether
or not to a noisy environment.
• Based on visual characteristics. These algorithms are
based on the extraction of the particularities of the
multimedia object from its visual characteristics. Several
algorithms of this type have been proposed in the lit-
erature, using techniques of diverse nature to obtain the
required representation. According to [8] these algorithms
can be classified into three groups:
1) Frame-by-frame video hashing [1], [3], [9]. This
group applies in an individual way algorithms based 978-1-5386-3057-0/17/$31.00 © 2017 IEEE