Does Continual Learning = Catastrophic Forgetting? Anh Thai, Stefan Stojanov, Isaac Rehg, James M. Rehg Georgia Institute of Technology {athai6,sstojanov,isaacrehg,rehg}@gatech.edu Abstract Continual learning is known for suffering from catas- trophic forgetting, a phenomenon where earlier learned concepts are forgotten at the expense of more recent sam- ples. In this work, we challenge the assumption that con- tinual learning is inevitably associated with catastrophic forgetting by presenting a set of tasks that surprisingly do not suffer from catastrophic forgetting when learned con- tinually. We attempt to provide an insight into the prop- erty of these tasks that make them robust to catastrophic forgetting and the potential of having a proxy represen- tation learning task for continual classiﬁcation. We fur- ther introduce a novel yet simple algorithm, YASS that out- performs state-of-the-art methods in the class-incremental categorization learning task. Finally, we present DyRT, a novel tool for tracking the dynamics of representation learn- ing in continual models. The codebase, dataset and pre- trained models released with this article can be found at https://github.com/ngailapdi/CLRec. 1. Introduction In continual learning, a stream of incrementally-arriving inputs is processed without access to past data. Speciﬁcally, given a new learning exposure, 1 the learner must update its representation, after which it is evaluated on all of the tasks it has seen so far. A key challenge is to avoid catastrophic forgetting [40], which arises if representations learned early in training are degraded by more recent exposures. Sub- stantial effort has been made to address catastrophic for- getting [14, 60, 27, 19], and it has come to exemplify the continual learning problem. However, past works have ex- plored a surprisingly limited set of tasks, with an almost ex- clusive focus on classiﬁcation. In particular, no prior works have addressed reconstruction tasks, such as 3D shape or depth+normals prediction from single images. In this paper, we demonstrate that continual learning is not synonymous with catastrophic forgetting, and the ex- 1 We use the term learning exposure to refer to each new increment of data, i.e. the learner’s next ”exposure” to the concepts being learned.  !  "  #$" …  #  #%"  &#$" … t = 0 t = 1 … No Forgetting Reconstruction tasks Continual Learning YASS algorithm Classification task Visual Feature Representation Representation Decoder BCE Loss FC Layer Outputs VF Targets φ F t ˆ Y t Y T VF Classifier Training Nearest Class Mean (NCM) Proxy Task for CL Classification Visualization of Forgetting Sec. 4 Sec. 3 Sec. 5 Sec. 6 Figure 1: An overview of the four main ﬁndings presented in this work. tent to which forgetting occurs depends upon both the na- ture of the the task (classiﬁcation vs. reconstruction) and the data distribution (e.g. whether the learner is exposed to each concept once or with repetition). Our focus is on the class-incremental single-head setting, where the data from a single dataset is distributed across a sequence of ex- posures [29]. An example is learning image classiﬁcation from a sequence of exposures to single object categories. We present four main ﬁndings (see Fig. 1 for an overview): • Continual reconstruction tasks do not suffer from catastrophic forgetting, and repeated exposures [47] lead asymptotically to batch performance (Sec. 3) • Using reconstruction as a proxy task for classiﬁca- tion results in a competitive continual learning method (Sec. 4) • A novel baseline method for continual classiﬁcation, called YASS, outperforms all previous approaches to class-incremental learning (Sec. 5) • Forgetting disproportionately affects the FC layer dur- ing continual classiﬁer learning, and a novel Dynamic 1 arXiv:2101.07295v1 [cs.LG] 18 Jan 2021