© 2019 Usha Mittal, Sonal Srivastava and Dr. Priyanka Chawla. This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license. Journal of Computer Science Original Research Paper Object Detection and Classification from Thermal Images Using Region based Convolutional Neural Network Usha Mittal, Sonal Srivastava and Dr. Priyanka Chawla Department of Computer Science and Engineering, Lovely Professional University, Punjab, India Article history Received: 04-04-2019 Revised: 19-06-2019 Accepted: 16-07-2019 Corresponding Author: Dr. Priyanka Chawla Department of Computer Science and Engineering, Lovely Professional University, Punjab, India Email: priyanka.22046@lpu.co.in Abstract: In recent years, object detection and classification has gained so much popularity in different application areas like face detection, self- driving cars, pedestrian detection, security surveillance systems etc. The traditional detection methods like background subtraction, Gaussian Mixture Model (GMM), Support Vector Machine (SVM) have certain drawbacks like overlapping of objects, distortion due to smoke, fog, lightening conditions etc. In this paper, thermal images are used as thermal cameras capture the image by using the heat generated by the objects. Thermal camera images are not influenced by smoke and bad weather conditions which makes them a built-up apparatus in inquiry and safeguards or fire-fighting applications. These days, deep learning techniques are extensively used for detection and classification. In this paper, a comparative analysis has been done by applying Faster region based convolutional neural network on thermal images and visual spectrum images. The experimental results show that thermal camera images are better as compared to visible spectrum images. Keywords: Object Detection, Classification, Faster R-CNN, Thermal Images, Visible Spectrum Images Introduction In computer vision, the process of scanning and searching for an object in an image or a video is known as detection of objects. People can easily recognize and distinguish objects present in a picture. The human visual framework is quick and exact and can perform complex undertakings like distinguishing different objects and identify obstructions with minimal aware ideas (Kaur and Talwar, 2016). With the accessibility of a lot of information, faster GPUs and better calculations, we can now effortlessly prepare systems to identify and classify various objects inside an image with high precision. Images taken with cell phones are normally complicated and contain various objects. Thus, assigning labels with image classification models can end up being complicated and questionable. Hence, in an individual picture numerous significant objects can be recognized by utilizing various models of object detection. Another significance of object detection is that the localization of the objects is given as compared to image classification. Nonetheless, because of huge varieties of perspectives, positions, obstacles and lighting conditions, it's hard to splendidly achieve object detection with an extra object localization work. The main objective of object detection is to decide where objects are situated in a given picture (object localization) (Javier, 2017) and then classifying the categories for each detected objects. So the task of object detection models can be categorized into three phases. Selection of Region As various objects may show up in many places of the picture and had different resolutions or sizes, it is an individual decision to filter the entire picture with a multi- scale sliding window (Harsha and Anne, 2016). Because of countless windows, it is computationally costly and creates an excessive number of repetitive windows. But if just a limited number of sliding window formats is used, inadmissible locales might be created. Extraction of Features It is a process to extract visual features to identify different objects by providing correct and powerful descriptions about these detected objects. There are various feature extraction technique like HOG, Haar-like features and SIFT (Harsha and Anne, 2016).