© 2019 Usha Mittal, Sonal Srivastava and Dr. Priyanka Chawla. This open access article is distributed under a Creative
Commons Attribution (CC-BY) 3.0 license.
Journal of Computer Science
Original Research Paper
Object Detection and Classification from Thermal Images
Using Region based Convolutional Neural Network
Usha Mittal, Sonal Srivastava and Dr. Priyanka Chawla
Department of Computer Science and Engineering, Lovely Professional University, Punjab, India
Article history
Received: 04-04-2019
Revised: 19-06-2019
Accepted: 16-07-2019
Corresponding Author:
Dr. Priyanka Chawla
Department of Computer
Science and Engineering,
Lovely Professional University,
Punjab, India
Email: priyanka.22046@lpu.co.in
Abstract: In recent years, object detection and classification has gained so
much popularity in different application areas like face detection, self-
driving cars, pedestrian detection, security surveillance systems etc. The
traditional detection methods like background subtraction, Gaussian
Mixture Model (GMM), Support Vector Machine (SVM) have certain
drawbacks like overlapping of objects, distortion due to smoke, fog,
lightening conditions etc. In this paper, thermal images are used as thermal
cameras capture the image by using the heat generated by the objects.
Thermal camera images are not influenced by smoke and bad weather
conditions which makes them a built-up apparatus in inquiry and
safeguards or fire-fighting applications. These days, deep learning
techniques are extensively used for detection and classification. In this
paper, a comparative analysis has been done by applying Faster region
based convolutional neural network on thermal images and visual spectrum
images. The experimental results show that thermal camera images are
better as compared to visible spectrum images.
Keywords: Object Detection, Classification, Faster R-CNN, Thermal
Images, Visible Spectrum Images
Introduction
In computer vision, the process of scanning and
searching for an object in an image or a video is
known as detection of objects. People can easily
recognize and distinguish objects present in a picture.
The human visual framework is quick and exact and
can perform complex undertakings like distinguishing
different objects and identify obstructions with minimal
aware ideas (Kaur and Talwar, 2016). With the
accessibility of a lot of information, faster GPUs and
better calculations, we can now effortlessly prepare
systems to identify and classify various objects inside
an image with high precision. Images taken with cell
phones are normally complicated and contain various
objects. Thus, assigning labels with image
classification models can end up being complicated and
questionable. Hence, in an individual picture numerous
significant objects can be recognized by utilizing
various models of object detection. Another
significance of object detection is that the localization
of the objects is given as compared to image
classification. Nonetheless, because of huge varieties of
perspectives, positions, obstacles and lighting
conditions, it's hard to splendidly achieve object
detection with an extra object localization work. The
main objective of object detection is to decide where
objects are situated in a given picture (object
localization) (Javier, 2017) and then classifying the
categories for each detected objects. So the task of object
detection models can be categorized into three phases.
Selection of Region
As various objects may show up in many places of the
picture and had different resolutions or sizes, it is an
individual decision to filter the entire picture with a multi-
scale sliding window (Harsha and Anne, 2016). Because
of countless windows, it is computationally costly and
creates an excessive number of repetitive windows. But if
just a limited number of sliding window formats is used,
inadmissible locales might be created.
Extraction of Features
It is a process to extract visual features to identify
different objects by providing correct and powerful
descriptions about these detected objects. There are various
feature extraction technique like HOG, Haar-like features
and SIFT (Harsha and Anne, 2016).