Pose Normalization Network for Object Classiﬁcation Bingquan Shen Abstract—Convolutional Neural Networks (CNN) have demonstrated their effectiveness in synthesizing 3D views of object instances at various viewpoints. Given the problem where one have limited viewpoints of a particular object for classiﬁcation, we present a pose normalization architecture to transform the object to existing viewpoints in the training dataset before classiﬁcation to yield better classiﬁcation performance. We have demonstrated that this Pose Normalization Network (PNN) can capture the style of the target object and is able to re-render it to a desired viewpoint. Moreover, we have shown that the PNN improves the classiﬁcation result for the 3D chairs dataset and ShapeNet airplanes dataset when given only images at limited viewpoint, as compared to a CNN baseline. Keywords—Convolutional neural networks, object classiﬁcation, pose normalization, viewpoint invariant. I. I NTRODUCTION C ONVOLUTIONAL Neural Networks (CNN) have been shown to be effective on a variety of computer vision tasks, such as classiﬁcation [1], [2], detection [3], [4], semantic segmentation [5], and image captioning [6]. Its success could be attributed to the increase in computing power and the availability of big data sets such as the PASCAL VOC [7] and ImageNet [8]. For object recognition, the typical approach is to train the network with multiple images of the object undergoing a combination of variations such as lighting, pose and background [9], and the network is expected to implicitly learn the variations from the data. However, when one have limited viewpoints of a particular object for classiﬁcation, this method would not be feasible. Recent works have shown that CNN is capable of generating 2D projections of 3D objects [10] given the desired model parameters, such as viewpoint and color. In [11], they found that one can disentangle the network’s latent variables to represent object style and variations, such as out-of-plane rotation. In this paper, we aim to use these prior knowledge of 3D object rotation to aid in classiﬁcation task. Given the problem of limited viewpoints of a particular object for classiﬁcation, we propose the Pose Normalization Network (PNN) to transform the object to an existing viewpoint in the training dataset for before classiﬁcation. Bingquan Shen is with the DSO National Laboratories, Singapore (E-mail: sbingqua@dso.org.sg). Fig. 1 Classiﬁcation using PNN The paper is organised as follows. Firstly, we begin with a review of the related works of using CNN for modeling of out-of-plane rotation and image pre-processing networks. Next, we introduce the PNN architecture, datasets used and the methodology used for training and classiﬁcation. Then, we compare the classiﬁcation results of the PNN to a baseline CNN trained with the same training dataset. Finally, we conclude that PNN yield better classiﬁcation results than the baseline CNN. II. RELATED WORK As mentioned, there are a number of works which utilizes CNN for modeling out-of-plane rotations. In [10], a CNN was trained to generate 2D projections of 3D objects given speciﬁc parameters in a supervised setting. Their approach requires the input of the desired object class, and hence cannot generalizes to unseen classes. On the other hand, our method uses an encoder to encode the style of the object, so it is able to generalise to novel objects. A. Network Architecture Recently, [12], [13] have shown that variational autoencoder (VAE) can disentangle variations between style and label of MNIST images. Based on the VAE, [11] developed the Inverse Graphics Network (IGN), which is able to disentangle factors of variation, including out-of-plane rotations, from the style of the object within the image in its learnt image representation. By varying the image representations, the IGN is able to re-render an World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:11, No:3, 2017 359 International Scholarly and Scientific Research & Innovation 11(3) 2017 scholar.waset.org/1307-6892/10006613 International Science Index, Computer and Information Engineering Vol:11, No:3, 2017 waset.org/Publication/10006613