Very Deep Graph Neural Networks Via Noise Regularisation Jonathan Godwin DeepMind Michael Schaarschmidt DeepMind Alexander Gaunt DeepMind Alvaro Sanchez-Gonzalez DeepMind Yulia Rubanova DeepMind Petar Veliˇ ckovi´ c DeepMind James Kirkpatrick DeepMind Peter Battaglia DeepMind Abstract Graph Neural Networks (GNNs) perform learned message passing over an input graph, but conventional wisdom says performing more than handful of steps makes training difficult and does not yield improved performance. Here we show the contrary. We train a deep GNN with up to 100 message passing steps and achieve several state-of-the-art results on two challenging molecular property prediction benchmarks, Open Catalyst 2020 IS2RE and QM9. Our approach depends crucially on a novel but simple regularisation method, which we call “Noisy Nodes”, in which we corrupt the input graph with noise and add an auxiliary node autoencoder loss if the task is graph property prediction. Our results show this regularisation method allows the model to monotonically improve in performance with increased message passing steps. Our work opens new opportunities for reaping the benefits of deep neural networks in the space of graph and other structured prediction problems. 1 Introduction Advances in the ability to successfully train very deep neural networks have been key to improving performance in image recognition, language modeling, and many other domains [35, 40, 50, 29, 25, 13]. Graph Neural Networks (GNNs) are a family of deep networks that operate on graph structured data by iteratively passing learned messages over the graph’s structure [47, 12, 23, 7]. While GNNs are very effective in a wide variety of tasks [61, 56, 5], deep GNNs, which perform more than 5-10 message passing steps, have not typically yielded better performance than shallower GNNs [38, 60, 62]. While in principle deep GNNs should have greater expressivity and ability to capture complex functions, it has been proposed that in practice “oversmoothing” [17] and “bottleneck effects” [1] limit the potential benefits of deep GNNs. The purpose of this work is to reap the benefits of deep GNNs while avoiding such limitations. Oversmoothing is a proposed phenomenon where a GNN’s latent node representations become increasing similar over successive steps of message passing [17]. Once these representations are oversmoothed, adding further steps does not add expressive capacity, and so performance does not improve. Bottleneck effects are thought to limit the ability of a deep GNN to communicate information over long ranges, because as the number of steps increases and causes the receptive * Correspondence to jonathangodwin@deepmind.com. Preprint. Under review. arXiv:2106.07971v1 [cs.LG] 15 Jun 2021