978-1-6654-2614-5/21/$31.00 ©2021 IEEE
Challenges Towards Hardware Acceleration of the
Deformable Shape Tracking Application
Nikos Petrellis
Dept. of Electrical and
Computer Engineering
University of Peloponnese
Patra, Greece
npetrellis@uop.gr
Georgios Keramidas
School of Informatics
Aristotle University of
Thessaloniki
Thessaloniki, Greece
gkeramidas@csd.auth.gr
Panagiotis Christakos
Dept. of Electrical and
Computer Engineer
University of Peloponnese
Patra, Greece
p.christakos @esda-lab.gr
Nikolaos Voros
Dept. of Electrical and
Computer Engineering
University of Peloponnese
Patra, Greece
voros@esda-lab.gr
Stavros Zogas
Dept. of Electrical and
Computer Engineering
University of Peloponnese
Patra, Greece
s.zogas @esda-lab.gr
Christos Antonopoulos
Dept. of Electrical and
Computer Engineering
University of Peloponnese
Patra, Greece
ch.antonop@esda-lab.gr
Panagiotis Mousouliotis
Dept. of Electrical and
Computer Engineering
Aristotle University of
Thessaloniki
Thessaloniki, Greece
p.mousouliotis@esda-
lab.gr
Abstract— In the context of this paper, a shape tracking
application based on landmark alignment is transformed to
support implementation in Field Programmable Gate Arrays
(FPGAs). Towards this direction, several challenges are posed
since a) computational intensive operations have to be replaced
by faster ones, b) specific loops have to be modified (e.g.,
unrolled) to support the implementation of operations in
parallel with different hardware resources, c) multiple pre-
trained models have to be compared in terms of speed and
accuracy, d) partial loading of the pre-trained models has to be
examined in order to fit their parameters in the Block Random
Access Memories (BRAMs) of the FPGA for faster access, and
e) alternative arithmetic representations have to be evaluated
for higher speed and reduced resources.
The C++ Deformable Shape Tracking (DEST)
implementation of face alignment that is based on an Ensemble
of Regression Trees is employed in our approach. The DEST
application uses Eigen library routines to implement algebraic
operations which are proved to be quite slow. The achievements
of this paper, concern the replacement of appropriate Eigen
calls in time critical paths with fast C code that can be directly
used to synthesize reconfigurable hardware implementations.
The elimination of the computational intensive Eigen calls has
already improved the speed of the face alignment application by
more than 240 times. In this paper we examine how the modified
source code structure of the DEST application can be used to
address the challenges described above.
Keywords— Face Alignment, Deformable Shape Tracking,
Eigen, Acceleration, Hardware Implementation
I. INTRODUCTION
Face alignment is used in several applications (driver
drowsiness detection, recognition of human expression, etc).
In regression-based methods, a series of mapping functions
iteratively updates the face shape. Our approach is based on
the 2D facial landmark detection algorithm presented by
Kazemi and Sullivan in [1] where an Ensemble of Regression
Trees (ERT) has been used to estimate the position of the
facial landmarks. Facial deformable model fitting is addressed
using cascaded regression in [2]. A Convolutional Neural
Network (CNN) is used for facial landmark detection in [3],
and post-processing is used to address the shape shaking that
appears in consecutive frames. Facial expression recognition
applications infer the emotional states in real-time and are
important in the intelligent interaction between computers and
humans. In [4], ERT is used to evaluate computer graphics
rendered datasets. ERT consists of 500 trees with depth equal
to five. Dornaika et al. [5] used a similar algorithm for
recognizing age in facial images. Α 2D landmark detector also
based on ERT is presented in [6] for landmark localization
balanced between accuracy and high speed. In [7], 68 facial
landmarks are aligned, for measuring the reaction of people
on advertisements.
The DEST library [8] implements the algorithm presented
in [1] using C++ and the Eigen template library for the
description of matrix operations. The cost of the compact
algebraic description offered by Eigen is the high latency. The
Eigen/DEST C++ classes of the original code, do not allow
the implementation of the computational intensive operations
in FPGAs. Therefore, the DEST and Eigen libraries have been
ported to Linux Ubuntu operating system, where compilers
compatible with the ones used in state-of-the-art hardware
design tools, such as Xilinx Vitis, are employed. Several
Eigen and C++ classes/datatypes have been replaced since
their hardware implementation is not feasible. The DEST face
tracking application has been profiled, in order to extract the
computational intensive operations that are candidates for
hardware implementation. More specifically, the predict
routine that aligns the landmarks in a frame and the nested
calls have been flattened and converted in C. Their arguments
have been converted to floating point or integer pointers. The
modified application is 240 times faster than the original one
and can be further accelerated if implemented in hardware.
This paper is structured as follows: the architecture of the
original and the modified DEST video tracking application is
described in Section II. Addressing the challenges posed for
hardware implementation is discussed in Section III.
II. ARCHITECTURE OF THE ORIGINAL AND MODIFIED DEST
APPLICATION
The face tracking DEST application retrieves frames from
a video stream and aligns LM landmarks in all or in specific
frames. To avoid aligning landmarks in irrelevant positions of
the frame, face detection takes place using the
detectSingleFace() routine of the OpenCV library. This
operation returns the coordinates of the face bounding
rectangle and is relatively slow compared to the landmark
This work has received funding from the European Union’s Horizon 2020
research and innovation programme under Grant Agreement No 871738 -
CPSoSaware: Cross-layer cognitive optimization tools & methods for the
lifecycle support of dependable CPSoS..