A High-Performance HOG Extractor on FPGA Vinh Ngo Department of Microelectronics and Electronics Systems Spain quangvinh.ngo@uab.cat Arnau Casadevall Department of Microelectronics and Electronics Systems Spain arnau.casadevall@uab.cat Marc Codina Department of Microelectronics and Electronics Systems Spain marc.codina@uab.cat David Castells-Rufas Department of Microelectronics and Electronics Systems Spain david.castells@uab.cat Jordi Carrabina Department of Microelectronics and Electronics Systems Spain jordi.carrabina@uab.cat ABSTRACT Pedestrian detection is one of the key problems in emerging self- driving car industry. And HOG algorithm has proven to provide good accuracy for pedestrian detection. There are plenty of research works have been done in accelerating HOG algorithm on FPGA because of its low-power and high-throughput characteristics. In this paper, we present a high-performance HOG architecture for pedestrian detection on a low-cost FPGA platform. It achieves a maximum throughput of 526 FPS with 640x480 input images, which is 3.25 times faster than the state of the art design. The accelerator is integrated with SVM-based prediction in realizing a pedestrian detection system. And the power consumption of the whole system is comparable with the best existing implementations. KEYWORDS Histogram of gradients, HOG extractor, FPGA HOG accelerator 1 INTRODUCTION Pedestrian detection is a safety critical application on autonomous cars. There are two main approaches to implement pedestrian detection systems. On one hand, the detection algorithm relies on all input image pixels. This approach uses deep learning method and it requires costly computing platforms with not only many processing cores but also large memory bandwidth and capacity. On the other hand, only extracted features from the image are input to the detection algorithm. This approach using HOG (Histogram of Gradients) [1] has proven to have good accuracy in detection [2]. While requiring less memory capacity, it is still a computing-intensive algorithm, which needs a low latency and high-throughput platform. FPGA, therefore, comes as suitable solution thanks to its capability in parallel processing. More importantly, FPGAs potentially have better energy efficiency in comparison with alternative platforms such as CPU and GPU. In this paper, we design and implement a hog feature extractor on a low-cost FPGA device, targeting at high throughput and low power consumption. This work is based on our previous work in [3]. There are several improvements to help achieve a high- performance design. First, the fixed-point number is used to represent values other than the integer number, which apparently increases the feature’s accuracy with the cost of computational complexity. Secondly, a pipeline for normalizing cell features to take advantages of hardware’s capability in pipeline and parallel execution. The output HOG normalized features are transferred to the HPS (Hard Processor System) for prediction process. Third, instead of buffering input images before extracting, which costs memory, input pixels are processed directly from the sensor by a pipeline. And finally, we optimize the pipeline design so as to achieve the highest throughput. The HOG extractor design can work at a maximum clock frequency of 162 MHz and provide a throughput of 526 FPS, the highest throughput in the state of the art. The design is then integrated into a heterogeneous system with SVM-based prediction software. The energy efficiency is comparable to the most efficient implementations. The paper is outlined as follows. An overview of the original HOG algorithm is described in section 2. Section 3 discusses related works regarding FPGA implementations of real-time HOG extractor. Section 4 presents our architectural design in detail. The experimental results and discussions are shown in section 5. Finally, the conclusions are presented in section 6. 2 HOG OVERVIEW The HOG algorithm consists of two main steps: gradient computation and histogram generation.