Accelerated GNN training with DGL and RAPIDS cuGraph in a
Fraud Detection Workflow
Brad Rees
NVIDIA
Wesley Chapel, FL, USA
brees@nvidia.com
Xiaoyun Wang
NVIDIA
Santa Clara, CA, USA
xiaoyunw@nvidia.com
Joe Eaton
NVIDIA
Austin, TX, USA
featon@nvidia.com
Onur Yilmaz
NVIDIA
Santa Clara, CA, USA
oyilmaz@nvidia.com
Rick Ratzel
NVIDIA
Austin, TX, USA
rratzel@nvidia.com
Dominque LaSalle
NVIDIA
Santa Clara, CA, USA
dlasalle@nvidia.com
ABSTRACT
Graph Neural Networks (GNNs) have gained the interest of industry
with Relational Graph Convolutional Networks (R-GCNs) showing
promise for fraud detection. Taking existing workfows that lever-
age graph features to train a gradient boosted decision tree (GBDT)
and replacing the graph features with GNN produced embedding
achieves an increase in accuracy. However, recent work has shown
that the combination of graph attributes with GNN embeddings
provides the biggest lift in accuracy.
Whether to use a GNN is half of the picture. Data loading, data
cleaning and prep (ETL), and graph processing are critical frst steps
before graph features or GNN training can be performed. Moreover,
the entire process is interactive, optimizing training and validation,
for shorter model delivery cycles. Quicker model updates are the
key to staying ahead of evolving fraud techniques. McDonald and
Deotte [1] published a BLOG on the importance of being able to
iterate quickly in fnding a solution.
The RAPIDS [2] suite of open-source software libraries gives the
data scientist the freedom to execute end-to-end analytics work-
fows on GPUs. The ETL and data loading portion is handled by
RAPIDS cuDF, which utilizes a familiar DataFrame API. The GBDT
process is handled by RAPIDS cuML that has an implementation of
XGBoost and RandomForest. The graph analytic portion is handled
by RAPIDS cuGraph. Recently cuGraph announced integration into
Deep Graph Library (DGL) [3]. For GNN training, graph sampling
can consume up to 80% of the training time. RAPIDS cuGraph
sampling algorithms execute 10x to 100x faster than similar CPU
versions and scale to support massive size graphs. Join us as we
dive into GNNs for fraud detection and as we demonstrate how
RAPIDS + DGL drastically reduces training time. We will cover
everything from accelerating data load and data prep to accelerated
GNN training with cuGraph + DGL.
ACM Reference Format:
Brad Rees, Xiaoyun Wang, Joe Eaton, Onur Yilmaz, Rick Ratzel, and Dom-
inque LaSalle. 2022. Accelerated GNN training with DGL and RAPIDS
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
KDD ’22, August 14–18, 2022, Washington, DC, USA
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9385-0/22/08.
https://doi.org/10.1145/3534678.3542603
Figure 1: Evolution of Detection Workfow
cuGraph in a Fraud Detection Workfow. In Proceedings of the 28th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22), Au-
gust 14–18, 2022, Washington, DC, USA. ACM, New York, NY, USA, 2 pages.
https://doi.org/10.1145/3534678.3542603
PRESENTER BIOS
Brad Rees ś RAPIDS cuGraph Lead, NVIDIA
Brad Rees is a Senior Manager at NVIDIA and lead of the RAPIDS
cuGraph team. Brad has been designing, implementing, and support-
ing a variety of advanced software and hardware systems within
the defense and research communities for over 30 years. Brad spe-
cializes in complex analytic systems, primarily using graph analytic
techniques for social and cyber network analysis. His technical
interests are in HPC, machine learning, deep learning, and graphs.
Brad has a Ph.D. in Computer Science from the Florida Institute of
Technology.
Joe Eaton - Principal System Engineer for Graph
and Data Analytics, NVIDIA
Joe Eaton is the Principal System Engineer for Graph and Data
Analytics at NVIDIA. He works on RAPIDS, dividing time between
cuML and cuGRAPH. His interests are general optimization and
applications of sparse linear algebra to industrial scale problems.
Previously, he was manager for sparse linear algebra CUDA libraries
cuSPARSE, cuSOLVER, and nvGRAPH, and managed AmgX, now
an open-source package of GPU- accelerated sparse iterative solvers.
Joe lives in Austin, Texas, and holds a Ph.D. in computational and
4820