IAPR-MEDPRAI Granulated deep learning and Z-numbers in motion detection and object recognition Sankar K. Pal 1 • Debasmita Bhoumik 1 • Debarati Bhunia Chakraborty 1 Received: 20 July 2018 / Accepted: 11 April 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2019 Abstract The article deals with the problems of motion detection, object recognition, and scene description using deep learning in the framework of granular computing and Z-numbers. Since deep learning is computationally intensive, whereas granular computing, on the other hand, leads to computation gain, a judicious integration of their merits is made so as to make the learning mechanism computationally efﬁcient. Further, it is shown how the concept of z-numbers can be used to quantify the abstraction of semantic information in interpreting a scene, where subjectivity is of major concern, through recognition of its constituting objects. The system, thus developed, involves recognition of both static objects in the background and moving objects in foreground separately. Rough set theoretic granular computing is adopted where rough lower and upper approximations are used in deﬁning object and background models. During deep learning, instead of scanning the entire image pixel by pixel in the convolution layer, we scan only the representative pixel of each granule. This results in a signiﬁcant gain in computation time. Arbitrary-shaped and sized granules, as expected, perform better than regular-shaped rectangular granules or ﬁxed-sized granules. The method of tracking is able to deal efﬁciently with various challenging cases, e.g., tracking partially overlapped objects and suddenly appeared objects. Overall, the granulated system shows a balanced trade-off between speed and accuracy as compared to pixel level learning in tracking and recognition. The concept of using Z-numbers, in providing a granulated linguistic description of a scene, is unique. This gives a more natural interpretation of object recognition in terms of certainty toward scene understanding. Keywords Deep learning  Granular computing  Rough sets  Video tracking  Object recognition  Z-numbers 1 Introduction Moving object detection, recognition and tracking ﬁnd application in several ﬁelds of computer vision such as surveillance, security, gesture recognition and intrusion detection. Video tracking is a tedious process due to the bulk of data involved with the video. In tracking, the target objects are associated with consecutive video frames. Detection becomes challenging when the frame rate is high. Moreover, the objects are likely to change their ori- entation with time, which adds to the complexity of tracking. Furthermore, only tracking the moving objects is not sufﬁcient. Determining the characteristics of the objects is also necessary which leads to object recognition. Various uncertainties and ambiguities make this task of video tracking challenging, and thus the issues are being studied over the years [1]. Video tracking can be supervised or unsupervised. In the supervised approaches, the initial object(s) to be tracked are labeled manually, whereas in unsupervised approach no labeling is needed. The method, we have explained here for detecting and recognizing continuously moving multiple objects in static background, is supervised. Here we have used image processing and machine learning techniques side by side. Granulation [2] is a basic step of human cognition system. It is a process like self-organization, self-production, mor- phogenesis, Darwinian evolution that are extracted from natural phenomena. It may be viewed as a process of natural clustering, i.e., replacing a ﬁne-grained universe by a coarse- grained one, more in line with human perception. Clusters or segments so formed by granulation (natural clustering) are & Sankar K. Pal sankar@isical.ac.in 1 Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700 108, India 123 Neural Computing and Applications https://doi.org/10.1007/s00521-019-04200-1