Fire detection on unconstrained videos using
color-aware spatial modeling and motion flow
Letricia P. S. Avalhais, Jose Rodrigues-Jr., Agma J. M. Traina
Institute of Mathematics and Computer Science, University of Sao Paulo
Sao Carlos, Brazil
{letricia, junio, agma}@icmc.usp.br
Abstract—The semantic segmentation of events on emergency
contexts involves the identification of previously defined events of
interest. In this work, the focused semantic event is the presence
of fire in videos. The literature presents several methods for
automatic video fire detection, but these methods were built
under assumptions, such as stationary cameras and controlled
lightening conditions that are often in contrast to the videos
acquired by hand-held devices. To fulfill this gap, we propose a
fire detection method, called SPATFIRE. Our method innovates
on three aspects: (1) it relies on a specifically tailored color model
named Fire-like Pixel Detector able to improve the accuracy of
fire detection; (2) it employs a new technique for motion com-
pensation, diminishing the problems observed in videos captured
with non-stationary cameras; and, (3) it defines a segmentation
method able to identify, not only the presence of fire in a
video, but also the segments in the video where fire occurs. We
experimented our proposal on two video datasets with different
characteristics and summarize the results to demonstrate the
superior efficacy, in terms of true positives and negatives, as
compared to state-of-the-art methods.
Keywords-Event recognition; video fire detection; spatial seg-
mentation; temporal flow
I. I NTRODUCTION
Mobile devices and streaming services have answered for
a huge increase in the amount of information produced as
videos. By means of surveillance, such information carries
potential for decision-making and security in several domains.
However, the examination of such videos relying on human
effort is time-consuming and exhaustive. These facts have led
to an increasing pursuit of intelligent systems able to manage
video content, as well as efforts that lead to advances on video
analysis and multimedia retrieval systems.
One of the intensively studied branches of video analysis
is the automatic identification of specific events of interest.
This task is used to support several activities as automatic
tagging, indexing, and searching over multimedia information.
Also, surveillance and crisis management systems can benefit
from event detection aimed at recognizing anomalous behavior
or specific target events, applications where many types of
research have been conducted [1], [2].
In this work, we focus on the topic of specific events
detection, aiming at the identification of fire. Fire detectors
based on video analysis have several advantages over still fire
sensors. A video camera can cover a much wider area than a
single sensor, and can provide valuable information; e.g., the
dimension of the incident, the growth rate of the fire, and the
potential risk for a given scenario [3].
Our research is part of a collaboration with a larger
project
1
, which is developing an emergency system that uses
crowdsourcing images and videos, sent by mobile devices,
to support the decision making during emergency situations.
In the context of our project, an emergency situation in a
crowded environment may start to receive a volume of data
that can become impractical for the specialists to analyze.
Thus, the crisis monitoring system has to efficiently process
the incoming data identifying the relevant information that can
allow the specialists to take strategic decisions. For this reason,
our work was developed to cope with real-time applications
that have the execution time as a challenging constraint.
The most salient visual feature of fire is color, which is used
in several related methods. The yellow-reddish appearance
of fire is generally captured by color models in the spatial
domain [4]. Notwithstanding, methods that use only the spatial
color information are more prone to a high rate of false alarms.
This is because of the ambiguity with non-fire objects with
the same visual appearance. Dynamic textures [5], in this
context, have potential to capture other relevant cues. In terms
of spatial detection, regions of interest (ROIs) of fire can also
be segmented by taking advantage of wavelet transforms in
addition to color, including direction patches [6] or salient
region descriptors [7]. As observed by Phillips et al. [8],
the motion nature of fire can be the distinguishing key to
leveraging the fire detection. Indeed, many works based on the
combination of the static visual information with the temporal
content, reveal better performance than the methods based on
color only [9], [10].
It is important to highlight that, in general, the related works
tackle the fire detection problem from videos captured by
stationary cameras, or from videos with very few influence of
camera motion. This assumption does not fit the requirements
of a crowdsourcing emergency system, since videos shot
using hand-held mobile devices, especially under a crisis
situation, are very likely to have abrupt camera motion, blur,
and high luminosity variance. We incorporate such issues in
our methodology and, for evaluation purposes, we used two
datasets: one consisting of videos collected from the web, and
1
Project FP7-ICT-2013-EU-Brazil - “RESCUER - Reliable and Smart
Crowdsourcing Solution for Emergency and Crisis Management”
2016 IEEE 28th International Conference on Tools with Artificial Intelligence
2375-0197/16 $31.00 © 2016 IEEE
DOI 10.1109/ICTAI.2016.138
912
2016 IEEE 28th International Conference on Tools with Artificial Intelligence
2375-0197/16 $31.00 © 2016 IEEE
DOI 10.1109/ICTAI.2016.138
913
2016 IEEE 28th International Conference on Tools with Artificial Intelligence
2375-0197/16 $31.00 © 2016 IEEE
DOI 10.1109/ICTAI.2016.138
913