Learning Invariances in Neural Networks Gregory Benton Marc Finzi Pavel Izmailov Andrew Gordon Wilson Courant Institute of Mathematical Sciences New York University Abstract Invariances to translations have imbued convolutional neural networks with pow- erful generalization properties. However, we often do not know a priori what invariances are present in the data, or to what extent a model should be invariant to a given symmetry group. We show how to learn invariances and equivariances by parameterizing a distribution over augmentations and optimizing the training loss simultaneously with respect to the network parameters and augmentation parameters. With this simple procedure we can recover the correct set and extent of invariances on image classiﬁcation, regression, segmentation, and molecular property prediction from a large space of augmentations, on training data alone. 1 Introduction The ability to learn constraints or symmetries is a foundational property of intelligent systems. Humans are able to discover patterns and regularities in data that provide compressed representations of reality, such as translation, rotation, intensity, or scale symmetries. Indeed, we see the value of such constraints in deep learning. Fully connected networks are more ﬂexible than convolutional networks, but convolutional networks are more broadly impactful because they enforce the translation equivariance symmetry: when we translate an image, the outputs of a convolutional layer translate in the same way [24, 7]. Further gains have been achieved by recent work hard-coding additional symmetries, such as rotation equivariance, into convolutional neural networks [e.g., 7, 41, 44, 31] But we might wonder whether it is possible to learn that we want to use a convolutional neural network. Moreover, we typically do not know which constraints are suitable for a given problem, and to what extent those constraints should be enforced. The class label for the digit ‘6’ is rotationally invariant up until it becomes a ‘9’. Like biological systems, we would like to automatically discover the appropriate symmetries. This task appears daunting, because standard learning objectives such as maximum likelihood select for ﬂexibility, rather than constraints [29, 32]. In this paper, we provide an extremely simple and practical approach to automatically discovering invariances and equivariances, from training data alone. Our approach operates by learning a distribution over augmentations, then training with augmented data, leading to the name Augerino. Augerino (1) can learn both invariances and equivariances over a wide range of symmetry groups, including translations, rotations, scalings, and shears; (2) can discover partial symmetries, such as rotations not spanning the full range from [-⇡, ⇡]; (3) can be combined with any standard architectures, loss functions, or optimization algorithm with little overhead; (4) performs well on regression, classiﬁcation, and segmentation tasks, for both image and molecular data. To our knowledge, Augerino is the ﬁrst approach that can learn symmetries in neural networks from training data alone, without requiring a validation set or a special loss function. In Sections 3-5 we introduce Augerino and show why it works. The accompanying code can be found at https://github.com/g-benton/learning-invariances. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.