TSO: Curriculum Generation using continuous optimization Dipankar Sarkar 1 Mukur Gupta 2 Abstract The training of deep learning models poses vast challenges of including parameter tuning and or- dering of training data. Significant research has been done in Curriculum learning for optimizing the sequence of training data. Recent works have focused on using complex reinforcement learning techniques to find the optimal data ordering strat- egy to maximize learning for a given network. In this paper, we present a simple and efficient tech- nique based on continuous optimization. We call this new approach Training Sequence Optimiza- tion (TSO). There are three critical components in our proposed approach: (a) An encoder network maps/embeds training sequence into continuous space. (b) A predictor network uses the continu- ous representation of a strategy as input and pre- dicts the accuracy for fixed network architecture. (c) A decoder further maps a continuous represen- tation of a strategy to the ordered training dataset. The performance predictor and encoder enable us to perform gradient-based optimization in the con- tinuous space to find the embedding of optimal training data ordering with potentially better ac- curacy. Experiments show that we can gain 2AP with our generated optimal curriculum strategy over the random strategy using the CIFAR-100 dataset and have better boosts than the state of the art CL algorithms. We do an ablation study varying the architecture, dataset and sample sizes showcasing our approach’s robustness. 1. Introduction We observe that humans generally learn through a progres- sion of concepts which build on top of each other. This can be viewed in terms level of complexity at each step. We can see this at work right from a baby learning to walk and even take courses online. In both instances, we start with 1 Hike Ltd 2 Indian Institute of Technology Kharagpur. Cor- respondence to: Dipankar Sarkar <dipankars@hike.in>, Mukur Gupta <mukur.gupta1@gmail.com>. Copyright 2021 by the author(s). smaller ideas and work towards more complex concepts. In traditional Machine Learning, all the training examples are randomly presented, ignoring the various complexities of the current model’s dataset and state. Therefore, it is pertinent to ask if we can use the same learning strategy as humans to improve model training. According to early works (Bengio et al., 2009)(Kumar et al., 2010)(Zaremba & Sutskever, 2014) and recent efforts (Fan et al., 2018) (Graves et al., 2017)(Guo et al., 2020) in various applica- tions of machine learning, it seems this can be the case. This research area is called Curriculum Learning (CL); this can be further decomposed into two aspects. One, where we define the Curriculum, which is the set of tasks a model is trained on while the other is the program where we look at the learning state and choose the model’s training tasks. A more straightforward definition of CL which was first pro- posed in (Bengio et al., 2009) can be understood as training from easier examples to difficult ones. This has been used as a general training strategy for a wide scoping of applications including supervised learning tasks within computer vision (Guo et al., 2018), natural language processing (Jiang et al., 2014)(Platanios et al., 2019) and more. The advantages of CL can be seen as improving the model performance while accelerating the training process, which is very beneficial for Machine Learning research. It can also be seen as a step which is independent of original training algorithms. We see many different algorithms being proposed which can be classified within the pre-defined CL, where a human defines the curricula prior and automatic CL, which auto- matically derives it from the dataset and model. We have seen (Tay et al., 2019) pre-defined CL or hand-designed cur- ricula used to learn to perform mathematical operations with LSTMs (Hochreiter & Schmidhuber, 1997). Further, (Wu & Tian, 2016) used CL to train an RL agent to play Doom and used a small curriculum to improve sample efficiency with imitation learning. All these examples are hand-designed programs, which are fundamentally painful to design and ineffective, leading to catastrophic forgetting with learners (Parisi et al., 2019). We can say that the pre-defined CL algorithms have a hu- man expert who is the teacher, and the student is the ma- chine learning model. We see the development of automatic CL, where we reduce the dependence on the human ex- arXiv:2106.08569v1 [cs.LG] 16 Jun 2021