Noname manuscript No. (will be inserted by the editor) Continually Trained Life-Long Classiﬁcation Rudolf Szadkowski · Jan Drchal · Jan Faigl Received: date / Accepted: date Abstract Two challenges can be found in a life-long classi- ﬁer that learns continually: the concept drift, when the prob- ability distribution of data is changing in time, and catas- trophic forgetting when the earlier learned knowledge is lost. There are many proposed solutions to each challenge, but very little research is done to solve both challenges simul- taneously. We show that both the concept drift and catas- trophic forgetting are closely related in our proposed de- scription of the life-long continual classiﬁcation. We describe the process of continual learning as a wrap modiﬁcation, where a wrap is a manifold that can be trained to cover or un- cover a given set of samples. The notion of wraps and their cover/uncover modiﬁers are theoretical building blocks of a novel general life-long learning scheme, implemented as an ensemble of variational autoencoders. The proposed al- gorithm is examined on evaluation scenarios for continual learning and compared to state-of-the-art algorithms demon- strating the robustness to catastrophic forgetting and adapt- ability to concept drift but also showing the new challenges of the life-long classiﬁcation. Keywords continual learning, life-long learning, auto- encoder, catastrophic forgetting, concept drift 1 Introduction Continual learning is essential for domains where the in- coming data must be continually integrated into a classiﬁer. This work was supported by the Czech Science Foundation (GA ˇ CR) under research project No. 18-18858S. Department of Computer Science, Faculty of Electrical Engineering Czech Technical University in Prague Technick 2, 166 27, Prague 6, Czech Republic E-mail: {szadkrud,drchajan,faiglj}@fel.cvut.cz, WWW home page: https://comrob.fel.cvut.cz/ Such a classiﬁer is expected to train and predict the incom- ing data as long as it is in operation; hence, it is a life-long classiﬁer. On a life-long time scale, we expect that the prob- abilistic distribution of incoming data can change in time, where such a change is called concept drift. Moreover, there is also a problem of catastrophic forgetting: as the classi- ﬁer is continually trained, it can forget some knowledge it learned earlier. Both the concept drift and catastrophic for- getting are the main challenges of continual learning [7]. In neural networks, concept representations are distributed throughout the network weights [8]. In such distributed rep- resentations, during each training iteration, a slight change of a single parameter can lead to changes in multiple con- cepts at once. During continual learning, these changes can accumulate, and concepts might get catastrophically forgot- ten [5]. There are four approaches to preventing catastrophic forgetting [19]: – weight regularization to prevent overﬁtting of the current task [13] ; – storing selected samples that are used for rehearsal (in- corporating the old samples into training dataset) [30, 27]; – learning a generative model that produces samples used for rehearsal [29, 26]; – designing a network architecture that (partially) isolates concepts in subnetworks [21, 9]. A straightforward implementation of the architecture design approach is an ensemble of predictors [15], which we com- bine with Variational AutoEncoder (VAE) [12] used as a generative model. Our proposed method is based on an ensemble of clas- sifying and generating autoencoders. An autoencoder is a neural network trained to approximate an identity transfor- mation of the input. The autoencoder training is unsuper- vised because the loss, also called the reconstruction error,