FedGraphNN: A Federated Learning Benchmark System for Graph Neural Networks Chaoyang He & Keshav Balasubramanian & Emir Ceyani * Viterbi School of Engineering University of Southern California Carl Yang & Han Xie Department of Computer Science Emory University Lichao Sun & Lifang He Department of Computer Science and Engineering Lehigh University Liangwei Yang & Philip S. Yu Department of Computer Science University of Illinois at Chicago Yu Rong & Peilin Zhao & Junzhou Huang Machine Learning Center Tencent AI Lab Murali Annavaram & Salman Avestimehr Viterbi School of Engineering University of Southern California Abstract Graph Neural Network (GNN) research is rapidly growing thanks to the capac- ity of GNNs in learning distributed representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN train- ing is prohibitive due to privacy concerns, regulation restrictions, and commercial competitions. Federated learning (FL), a trending distributed learning paradigm, provides possibilities to solve this challenge while preserving data privacy. De- spite recent advances in vision and language domains, there is no suitable platform for the FL of GNNs. To this end, we introduce FedGraphNN, an open FL bench- mark system that can facilitate research on federated GNNs. FedGraphNN is built on a unified formulation of graph FL and contains a wide range of datasets from different domains, popular GNN models, and FL algorithms, with secure and efficient system support. Particularly for the datasets, we collect, prepro- cess, and partition 36 datasets from 7 domains, including both publicly avail- able ones and specifically obtained ones such as hERG and Tencent. Our empirical analysis showcases the utility of our benchmark system, while expos- ing significant challenges in graph FL: federated GNNs perform worse in most datasets with a non-IID split than centralized GNNs; the GNN model that at- tains the best result in the centralized setting may not maintain its advantage in the FL setting. These results imply that more research efforts are needed to unravel the mystery behind federated GNNs. Moreover, our system perfor- mance analysis demonstrates that the FedGraphNN system is computationally efficient and secure to large-scale graphs datasets. We maintain the source code at https://github.com/FedML-AI/FedGraphNN. * The first three authors contribute equally. Email: {chaoyang.he,keshavba,ceyani}@usc.edu. This is a full version. Shorter versions are accepted to ICLR 2021 Workshop on Distributed and Private Ma- chine Learning (DPML) and MLSys 2021 - GNNSys’21 - Workshop on Graph Neural Networks and Systems arXiv:2104.07145v2 [cs.LG] 8 Sep 2021