Learning with Incomplete Labels for Multi-label Image Annotation Using CNN and Restricted Boltzmann Machines Jonathan Mojoo (B ) , Yu Zhao, Muthusubash Kavitha, Junichi Miyao, and Takio Kurita Department of Information Engineering, Hiroshima University, Higashi-hiroshima, Hiroshima 739-8521, Japan jonathanmojoo@yahoo.com Abstract. Multi-label image annotation based on convolutional neural networks (CNN) has seen significant improvements in recent years. One problem, however, is that it is difficult to prepare complete labels for the training images and usually training data has missing or incom- plete labels. Restricted Boltzmann Machines (RBM) can explore the co-occurrence distribution of the labels and estimate the missing labels efficiently. Hence we intend to propose a novel learning model for multi- label image annotation with missing labels based on CNNs, which aims to regenerate the missing labels for an image by learning the generative model of labels using an RBM. Firstly, label sets are reconstructed by the pre-trained RBM model which is trained on data with some missing labels. Then the reconstructed label sets are used as a teacher signal to train the CNN. The effectiveness of the proposed approach is con- firmed by comparing the performance with baseline CNNs using various performance evaluation metrics on two different data sets. Experimental results prove that our RBM-CNN formulation exceeds the performance of the baseline CNN. Keywords: Multi-label image annotation · Restricted Boltzmann Machines · Convolutional neural network 1 Introduction With the recent exponential growth of the web, handling and retrieving large quantities of images and videos requires efficient automatic image annotation or tagging. Image annotation involves specifying the most relevant labels for any given image, that demonstrate its visual content. During the past decade, automated annotation with multi-label learning has been widely researched [14, 18]. Most of the conventional frameworks assume that all instances in the training set have complete labels. However, this is not the case with a lot of real-life image This work was partly supported by JSPS KAKENHI Grant Number 16K00239. c Springer Nature Switzerland AG 2019 T. Gedeon et al. (Eds.): ICONIP 2019, LNCS 11954, pp. 286–298, 2019. https://doi.org/10.1007/978-3-030-36711-4_25