ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 141 (2018) 104–111
1877-0509 © 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientifc committee of EUSPN 2018.
10.1016/j.procs.2018.10.155
© 2018 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the scientifc committee of EUSPN 2018.
Keywords: GPU; FPGA; hybrid computing; convolution neural network; machine learning; model transformation; heterogeneous platform;
1. Introduction
The core computing jobs of machine learning are matrix operations which can be turned into many simple comput-
ing works. General purpose processors such as CPUs which have complex instruction system and execute instructions
in the sequence are good at processing small amount complex logical controlling jobs but are weak for high-density
computing jobs. Moreover, people find increasing training data will improve the model’s accurate, more and more
training data overwhelm the general purpose processors.
*
Xu Liu. Tel.: +0-000-000-0000 ; fax: +0-000-000-0000.
E-mail address: xu.liu.1@ens.etsmtl.ca
The 9th International Conference on Emerging Ubiquitous Systems and Pervasive Networks
(EUSPN 2018)
A Hybrid GPU-FPGA-based Computing Platform for Machine
Learning
Xu Liu
a,*
, Hibat Allah Ounifi
a
, Abdelouahed Gherbi
a
, Yves Lemieux
b
, Wubin Li
b
a
Department of Software and IT Engineering
´
Ecole de Technologie Sup´ erieure (
´
ETS), Montr´ eal, Canada
{xu.liu.1, hibat-allah.ounifi.1}@ens.etsmtl.ca, abdelouahed.gherbi@etsmtl.ca
b
Ericsson Research, Ericsson, Montr´ eal, Canada
{yves.lemieux, wubin.li}@ericsson.com
Abstract
We present a hybrid GPU-FPGA based computing platform to tackle the high-density computing problem of machine learning. In
our platform, the training part of a machine learning application is implemented on GPU and the inferencing part is implemented on
FPGA. It should also include a model transplantation part which can transplant the model from the training part to the inferencing
part. For evaluating this design methodology, we selected the LeNet-5 as our benchmark algorithm. During the training phase,
GPU TitanXp’s speed was about 8.8x faster than CPU E-1620 and in the inferencing phase, FPGA Arria-10’s inferencing speed
was fastest, 44.4x faster than CPU E-1620 and 6341x faster than GPU TitanXp. Moreover, by adopting our design methodology,
we improved our LeNet-5 machine learning model’s accuracy from 99.05% to 99.13%, and successfully preserved the accuracy
(99.13%) when transplanting the model from the GPU platform to the FPGA platform.