GraphStorm an Easy-to-use and Scalable Graph Neural Network Framework: From Beginners to Heroes Jian Zhang AWS AI Santa Clara, USA jamezhan@amazon.com Da Zheng AWS AI Santa Clara, USA dzzhen@amazon.com Xiang Song AWS AI Santa Clara, USA xiangsx@amazon.com Theodore Vasiloudis AWS AI Seattle, USA thvasilo@amazon.com Israt Nisa AWS AI New York, USA nisisrat@amazon.com Jim Lu AWS AI Seattle, USA luzj@amazon.com ABSTRACT Applying Graph Neural Networks (GNNs) to real-world problems is challenging for machine learning (ML) practitioners due to two major obstacles. The frst hurdle is the high barrier to learn program- ming GNNs from scratch. The second challenge lies in overcoming engineering difculties when scaling GNN models for large graphs at an industry-level. To address these challenges, GraphStorm, an open-source framework, ofers a solution by providing an easy- to-use user interface and an end-to-end GNN training/inference pipeline that seamlessly handles extremely large graphs in a dis- tributed manner This tutorial aims to provide participants with a comprehensive understanding of GraphStorm, including its design principles, target users, and use cases, through presentations. The hands-on sections will enable attendees to walk through four prac- tical GraphStorm use cases that can assist them in leveraging GNNs to address real-world business problems. KEYWORDS Graph Neural Networks, Distributed Training, GraphStorm ACM Reference Format: Jian Zhang, Da Zheng, Xiang Song, Theodore Vasiloudis, Israt Nisa, and Jim Lu. 2023. GraphStorm an Easy-to-use and Scalable Graph Neural Network Framework: From Beginners to Heroes. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3580305.3599179 1 TARGET AUDIENCE AND PREREQUISITES FOR THE TUTORIAL Intent audience: This tutorial targets machine learning practi- tioners who are interested in or already working in graph machine learning tasks, and want to leverage easy-to-use and scalable tools to accelerate GNN adoption to address their own business problem, Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). KDD ’23, August 6–10, 2023, Long Beach, CA, USA © 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0103-0/23/08. https://doi.org/10.1145/3580305.3599179 and researchers who are interested in experimenting their novel GNN models on large graphs. Prerequisites: The attendees should have some knowledge with deep learning on graphs, and have used deep learning frameworks, e.g., Pytorch. Knowledge about graph neural network and DGL are better to have, but not required. Takeouts after participation of the tutorial: We expect that the attendees will have an understanding of GraphStorm’s basic information and application use cases. They will also know how to use GraphStorm in standalone mode to train GNN models for their own extensive graph data. 2 TUTORS 1. Jian Zhang, AWS AI, jamezhan@amazon.com 2. Da Zheng, AWS AI, dzzhen@amazon.com 3. Xiang Song, AWS AI, xiangsx@amazon.com 3 TUTORS’ SHORT BIO 3.1 List of in-person presenters 1. Jian Zhang: Jian is a senior applied scientist at AWS AI, using ML techniques to help customers solve various problems, such as fraud detection, image generation. He has success- fully developed and deployed GNN solutions for customers world-widely. 2. Da Zheng: Da is a senior applied scientist at AWS AI, leading the efort of building frameworks and algorithms to bring graph machine learning technologies in production. This includes DGL for GNN, DGL-KE for knowledge graph em- beddings, DistDGL for scaling GNN training to billion-scale graphs, TGL for temporal GNNs, and more. 3. Xiang Song: Xiang is a senior applied scientist at AWS AI, leading the efort of building frameworks and services for industrial applications. This includes DGL and DistDGL for scaling GNN to large scale graphs, Neptune ML, an graph ML service designed for Amazon Neptune graph database. 3.2 List of contributors 1. Theodore Vasiloudis: Theodore is an applied scientist who works in distributed machine learning and data processing. 2. Israt Nisa: Israt is an applied scientist who specializes in developing scalable and high-performing modules for GNNs. 5790