ScienceDirect Available online at www.sciencedirect.com Procedia Computer Science 141 (2018) 104–111 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientifc committee of EUSPN 2018. 10.1016/j.procs.2018.10.155 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientifc committee of EUSPN 2018. Keywords: GPU; FPGA; hybrid computing; convolution neural network; machine learning; model transformation; heterogeneous platform; 1. Introduction The core computing jobs of machine learning are matrix operations which can be turned into many simple comput- ing works. General purpose processors such as CPUs which have complex instruction system and execute instructions in the sequence are good at processing small amount complex logical controlling jobs but are weak for high-density computing jobs. Moreover, people find increasing training data will improve the model’s accurate, more and more training data overwhelm the general purpose processors. * Xu Liu. Tel.: +0-000-000-0000 ; fax: +0-000-000-0000. E-mail address: xu.liu.1@ens.etsmtl.ca The 9th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2018) A Hybrid GPU-FPGA-based Computing Platform for Machine Learning Xu Liu a,* , Hibat Allah Ounifi a , Abdelouahed Gherbi a , Yves Lemieux b , Wubin Li b a Department of Software and IT Engineering ´ Ecole de Technologie Sup´ erieure ( ´ ETS), Montr´ eal, Canada {xu.liu.1, hibat-allah.ounifi.1}@ens.etsmtl.ca, abdelouahed.gherbi@etsmtl.ca b Ericsson Research, Ericsson, Montr´ eal, Canada {yves.lemieux, wubin.li}@ericsson.com Abstract We present a hybrid GPU-FPGA based computing platform to tackle the high-density computing problem of machine learning. In our platform, the training part of a machine learning application is implemented on GPU and the inferencing part is implemented on FPGA. It should also include a model transplantation part which can transplant the model from the training part to the inferencing part. For evaluating this design methodology, we selected the LeNet-5 as our benchmark algorithm. During the training phase, GPU TitanXp’s speed was about 8.8x faster than CPU E-1620 and in the inferencing phase, FPGA Arria-10’s inferencing speed was fastest, 44.4x faster than CPU E-1620 and 6341x faster than GPU TitanXp. Moreover, by adopting our design methodology, we improved our LeNet-5 machine learning model’s accuracy from 99.05% to 99.13%, and successfully preserved the accuracy (99.13%) when transplanting the model from the GPU platform to the FPGA platform.