(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 7, 2017 Ladder Networks: Learning under Massive Label Deﬁcit Behroz Mirza and Tahir Syed National University of Computer and Emerging Sciences, Karachi, Pakistan Jamshed Memon Barrett Hodgson University, Karachi, Pakistan Yameen Malik Smartlytics Karachi, Pakistan Abstract—Advancement in deep unsupervised learning are ﬁnally bringing machine learning close to natural learning, which happens with as few as one labeled instance. Ladder Networks are the newest deep learning architecture that proposes semi-supervised learning at scale. This work discusses how the ladder network model successfully combines supervised and unsupervised learning taking it beyond the pre-training realm. The model learns from the structure, rather than the labels alone transforming it from a label learner to a structural observer. We extend the previously-reported results by lowering the number of labels, and report an error of 1.27 on 40 labels only, on the MNIST dataset that in a fully supervised setting, uses 60000 labeled training instances. Keywords—Ladder networks; semi-supervised learning; deep learning; structure observer I. I NTRODUCTION Over the past decades, there has been an effort in machine learning theoretical research to move from supervised to un- supervised methods, for the reasons of 1. arduous effort in labeling data, and 2. the inherent aptitude of unsupervised approaches to discover the latent structure of data without the guiding (or misguiding) external inﬂuence of labels. The paper discusses the opportunities and strengths in deep unsupervised learning and its implications towards unsuper- vised and weekly supervised learning in general. The model selected for this purpose is the recently-introduced ladder network designed by Valpola [1]. This work modiﬁes the model conﬁguration and reports an error on the extremely popular MNIST benchmark of 1.27 using 40 labels only. This is 10 labels fewer than previously reported results. Ladder networks successfully combine supervised learning with unsupervised learning in deep neural networks models. Prior to this unsupervised learning was used for specialized pre-training task, followed by supervised learning. However ladder networks, are trained to simultaneously minimize the sum of supervised and unsupervised cost functions using backpropagation, thus eliminating the need for layer-wise pre- training. The model has the distinctive feature of learning from the structure in the data instead of solely from the labels alone. This novelty results in minimizing the amount of labelled data required for training the network. As most of the data are unlabeled, the model learns principal features from the small set of labelled data and correlated features from the large set of unlabeled data concurrently [2]. This makes the machine learning process narrowly closer to natural learning. The rest of the paper is organized as follows. Section II - ‘Deep Unsupervised Learning’ discusses the models namely RBM and Auto encoders. Section III - ‘Semi Supervised Learning’ discusses the Ladder Networks model followed by the experiments and results. Section IV - ‘Conclusion’ concludes the paper and discusses future research directions. II. DEEP UNSUPERVISED LEARNING A. Relaxing Supervision Unsupervised learning forms a class of machine learning tech- niques of deducing a function to disentangle hidden structure from unlabeled data. What clearly distinguishes unsupervised learning from supervised learning is unlabeled samples are used during training so there is no error or reward signal to evaluate a potential solution. As unsupervised learning attempts to draw inferences from datasets consisting of input data without labeled responses it is closely related to the problem of density estimation in statistics [3]. Hinton and Salakudinov [4] proposed the idea of the stochastic RBM; symmetrical arrangement of binary stochastic neurons in a Boltzmann Machine where the two layers of the model for a bipartite graph. Later works by [5] suggested Auto Encoders for pre training; pre train each successive layers using unsupervised measure thus producing an enriched useful higher-level representation from the lower-level representation output. State of the art generalization can later be achieved by running Gradient descent on supervised format. Transitioning probability to unsupervised learning looks promising based on the fact - natural learning is unsupervised; we learn the structure around us by observing not by the names of the associated objects. B. Greedy Unsupervised Pre-training The year 2006 marks the breakthrough in training deep architectures as RBM were proposed followed by stacked autoencoders (SAEs) (Fig. 1). Both approaches used the notion of Greedy layer-wise unsupervised pre-training followed by supervised ﬁne-tuning. The concepts Greedy layer wise pre training and Supervised ﬁne tuning have profound impacts on Unsupervised Learning. Unsupervised pre-training leads to • Pre-conditioning the model, whereby arranging the parameter values in suitable ranges later to be used in www.ijacsa.thesai.org 502 | Page