978-1-6654-2614-5/21/$31.00 ©2021 IEEE Challenges Towards Hardware Acceleration of the Deformable Shape Tracking Application Nikos Petrellis Dept. of Electrical and Computer Engineering University of Peloponnese Patra, Greece npetrellis@uop.gr Georgios Keramidas School of Informatics Aristotle University of Thessaloniki Thessaloniki, Greece gkeramidas@csd.auth.gr Panagiotis Christakos Dept. of Electrical and Computer Engineer University of Peloponnese Patra, Greece p.christakos @esda-lab.gr Nikolaos Voros Dept. of Electrical and Computer Engineering University of Peloponnese Patra, Greece voros@esda-lab.gr Stavros Zogas Dept. of Electrical and Computer Engineering University of Peloponnese Patra, Greece s.zogas @esda-lab.gr Christos Antonopoulos Dept. of Electrical and Computer Engineering University of Peloponnese Patra, Greece ch.antonop@esda-lab.gr Panagiotis Mousouliotis Dept. of Electrical and Computer Engineering Aristotle University of Thessaloniki Thessaloniki, Greece p.mousouliotis@esda- lab.gr AbstractIn the context of this paper, a shape tracking application based on landmark alignment is transformed to support implementation in Field Programmable Gate Arrays (FPGAs). Towards this direction, several challenges are posed since a) computational intensive operations have to be replaced by faster ones, b) specific loops have to be modified (e.g., unrolled) to support the implementation of operations in parallel with different hardware resources, c) multiple pre- trained models have to be compared in terms of speed and accuracy, d) partial loading of the pre-trained models has to be examined in order to fit their parameters in the Block Random Access Memories (BRAMs) of the FPGA for faster access, and e) alternative arithmetic representations have to be evaluated for higher speed and reduced resources. The C++ Deformable Shape Tracking (DEST) implementation of face alignment that is based on an Ensemble of Regression Trees is employed in our approach. The DEST application uses Eigen library routines to implement algebraic operations which are proved to be quite slow. The achievements of this paper, concern the replacement of appropriate Eigen calls in time critical paths with fast C code that can be directly used to synthesize reconfigurable hardware implementations. The elimination of the computational intensive Eigen calls has already improved the speed of the face alignment application by more than 240 times. In this paper we examine how the modified source code structure of the DEST application can be used to address the challenges described above. KeywordsFace Alignment, Deformable Shape Tracking, Eigen, Acceleration, Hardware Implementation I. INTRODUCTION Face alignment is used in several applications (driver drowsiness detection, recognition of human expression, etc). In regression-based methods, a series of mapping functions iteratively updates the face shape. Our approach is based on the 2D facial landmark detection algorithm presented by Kazemi and Sullivan in [1] where an Ensemble of Regression Trees (ERT) has been used to estimate the position of the facial landmarks. Facial deformable model fitting is addressed using cascaded regression in [2]. A Convolutional Neural Network (CNN) is used for facial landmark detection in [3], and post-processing is used to address the shape shaking that appears in consecutive frames. Facial expression recognition applications infer the emotional states in real-time and are important in the intelligent interaction between computers and humans. In [4], ERT is used to evaluate computer graphics rendered datasets. ERT consists of 500 trees with depth equal to five. Dornaika et al. [5] used a similar algorithm for recognizing age in facial images. Α 2D landmark detector also based on ERT is presented in [6] for landmark localization balanced between accuracy and high speed. In [7], 68 facial landmarks are aligned, for measuring the reaction of people on advertisements. The DEST library [8] implements the algorithm presented in [1] using C++ and the Eigen template library for the description of matrix operations. The cost of the compact algebraic description offered by Eigen is the high latency. The Eigen/DEST C++ classes of the original code, do not allow the implementation of the computational intensive operations in FPGAs. Therefore, the DEST and Eigen libraries have been ported to Linux Ubuntu operating system, where compilers compatible with the ones used in state-of-the-art hardware design tools, such as Xilinx Vitis, are employed. Several Eigen and C++ classes/datatypes have been replaced since their hardware implementation is not feasible. The DEST face tracking application has been profiled, in order to extract the computational intensive operations that are candidates for hardware implementation. More specifically, the predict routine that aligns the landmarks in a frame and the nested calls have been flattened and converted in C. Their arguments have been converted to floating point or integer pointers. The modified application is 240 times faster than the original one and can be further accelerated if implemented in hardware. This paper is structured as follows: the architecture of the original and the modified DEST video tracking application is described in Section II. Addressing the challenges posed for hardware implementation is discussed in Section III. II. ARCHITECTURE OF THE ORIGINAL AND MODIFIED DEST APPLICATION The face tracking DEST application retrieves frames from a video stream and aligns LM landmarks in all or in specific frames. To avoid aligning landmarks in irrelevant positions of the frame, face detection takes place using the detectSingleFace() routine of the OpenCV library. This operation returns the coordinates of the face bounding rectangle and is relatively slow compared to the landmark This work has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No 871738 - CPSoSaware: Cross-layer cognitive optimization tools & methods for the lifecycle support of dependable CPSoS..