ORIGINAL ARTICLE Joint motion boundary detection and CNN-based feature visualization for video object segmentation Zahra Kamranian 1 • Ahmad Reza Naghsh Nilchi 1 • Hamid Sadeghian 2 • Federico Tombari 3 • Nassir Navab 3 Received: 14 March 2018 / Accepted: 19 August 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019 Abstract This paper presents a video object segmentation method which jointly uses motion boundary and convolutional neural network (CNN)-based class-level maps to carry out the co-segmentation of the frames. The key characteristic of the proposed approach is a combination of those two sources of information to create initial object and background regions. These regions are employed within the co-segmentation energy function. The motion boundary map detects the areas which contain the object movement, and the CNN-based class saliency map determines the regions with more impact on acquiring the correct network classiﬁcation. The proposed approach can be implemented on unconstrained natural videos which include changes in an object’s appearance, rapidly moving background, object deformation in non-rigid moving, rapid camera motion and even the existence of a static object. Experimental results on two challenging datasets (i.e., Davis and SegTrackv2 datasets) demonstrate the competitive performance of the proposed method compared with the state-of- the-art approaches. Keywords Video object segmentation  Class saliency map  Co-segmentation  Convolutional neural network  Feature visualization  Motion boundary 1 Introduction Video object segmentation is a task of extracting a fore- ground object from all the frames. It is a very challenging problem, which has an important impact on many visual applications such as action recognition [37], video sum- marization [22], video retrieval [13], and medical [6] and robotics applications [36]. Recently, it has gained more attention due to the abundance of unlabeled videos and outstanding performance of convolutional neural networks (CNNs) in image classiﬁcation and image segmentation [12, 17, 31, 46, 53, 54]. 1.1 Motivation Video segmentation contains huge challenges that can be summarized as follows. • The camera may move very rapidly, or there is a substantial camera shaking. & Ahmad Reza Naghsh Nilchi nilchi@eng.ui.ac.ir Zahra Kamranian zahra.kamranian@eng.ui.ac.ir Hamid Sadeghian h.sadeghian@eng.ui.ac.ir Federico Tombari tombari@in.tum.de Nassir Navab navab@cs.tum.edu 1 Department of Artiﬁcial Intelligence, Faculty of Computer Engineering, University of Isfahan, Isfahan 8174673441, Iran 2 Faculty of Engineering, University of Isfahan, Isfahan 8174673441, Iran 3 Computer Aided Medical Procedures and Augmented Reality, Technische Universita ¨t Mu ¨nchen, Munich, Germany 123 Neural Computing and Applications https://doi.org/10.1007/s00521-019-04448-7