Fire detection on unconstrained videos using color-aware spatial modeling and motion ﬂow Letricia P. S. Avalhais, Jose Rodrigues-Jr., Agma J. M. Traina Institute of Mathematics and Computer Science, University of Sao Paulo Sao Carlos, Brazil {letricia, junio, agma}@icmc.usp.br Abstract—The semantic segmentation of events on emergency contexts involves the identiﬁcation of previously deﬁned events of interest. In this work, the focused semantic event is the presence of ﬁre in videos. The literature presents several methods for automatic video ﬁre detection, but these methods were built under assumptions, such as stationary cameras and controlled lightening conditions that are often in contrast to the videos acquired by hand-held devices. To fulﬁll this gap, we propose a ﬁre detection method, called SPATFIRE. Our method innovates on three aspects: (1) it relies on a speciﬁcally tailored color model named Fire-like Pixel Detector able to improve the accuracy of ﬁre detection; (2) it employs a new technique for motion com- pensation, diminishing the problems observed in videos captured with non-stationary cameras; and, (3) it deﬁnes a segmentation method able to identify, not only the presence of ﬁre in a video, but also the segments in the video where ﬁre occurs. We experimented our proposal on two video datasets with different characteristics and summarize the results to demonstrate the superior efﬁcacy, in terms of true positives and negatives, as compared to state-of-the-art methods. Keywords-Event recognition; video ﬁre detection; spatial seg- mentation; temporal ﬂow I. I NTRODUCTION Mobile devices and streaming services have answered for a huge increase in the amount of information produced as videos. By means of surveillance, such information carries potential for decision-making and security in several domains. However, the examination of such videos relying on human effort is time-consuming and exhaustive. These facts have led to an increasing pursuit of intelligent systems able to manage video content, as well as efforts that lead to advances on video analysis and multimedia retrieval systems. One of the intensively studied branches of video analysis is the automatic identiﬁcation of speciﬁc events of interest. This task is used to support several activities as automatic tagging, indexing, and searching over multimedia information. Also, surveillance and crisis management systems can beneﬁt from event detection aimed at recognizing anomalous behavior or speciﬁc target events, applications where many types of research have been conducted [1], [2]. In this work, we focus on the topic of speciﬁc events detection, aiming at the identiﬁcation of ﬁre. Fire detectors based on video analysis have several advantages over still ﬁre sensors. A video camera can cover a much wider area than a single sensor, and can provide valuable information; e.g., the dimension of the incident, the growth rate of the ﬁre, and the potential risk for a given scenario [3]. Our research is part of a collaboration with a larger project 1 , which is developing an emergency system that uses crowdsourcing images and videos, sent by mobile devices, to support the decision making during emergency situations. In the context of our project, an emergency situation in a crowded environment may start to receive a volume of data that can become impractical for the specialists to analyze. Thus, the crisis monitoring system has to efﬁciently process the incoming data identifying the relevant information that can allow the specialists to take strategic decisions. For this reason, our work was developed to cope with real-time applications that have the execution time as a challenging constraint. The most salient visual feature of ﬁre is color, which is used in several related methods. The yellow-reddish appearance of ﬁre is generally captured by color models in the spatial domain [4]. Notwithstanding, methods that use only the spatial color information are more prone to a high rate of false alarms. This is because of the ambiguity with non-ﬁre objects with the same visual appearance. Dynamic textures [5], in this context, have potential to capture other relevant cues. In terms of spatial detection, regions of interest (ROIs) of ﬁre can also be segmented by taking advantage of wavelet transforms in addition to color, including direction patches [6] or salient region descriptors [7]. As observed by Phillips et al. [8], the motion nature of ﬁre can be the distinguishing key to leveraging the ﬁre detection. Indeed, many works based on the combination of the static visual information with the temporal content, reveal better performance than the methods based on color only [9], [10]. It is important to highlight that, in general, the related works tackle the ﬁre detection problem from videos captured by stationary cameras, or from videos with very few inﬂuence of camera motion. This assumption does not ﬁt the requirements of a crowdsourcing emergency system, since videos shot using hand-held mobile devices, especially under a crisis situation, are very likely to have abrupt camera motion, blur, and high luminosity variance. We incorporate such issues in our methodology and, for evaluation purposes, we used two datasets: one consisting of videos collected from the web, and 1 Project FP7-ICT-2013-EU-Brazil - “RESCUER - Reliable and Smart Crowdsourcing Solution for Emergency and Crisis Management” 2016 IEEE 28th International Conference on Tools with Artificial Intelligence 2375-0197/16 $31.00 © 2016 IEEE DOI 10.1109/ICTAI.2016.138 912 2016 IEEE 28th International Conference on Tools with Artificial Intelligence 2375-0197/16 $31.00 © 2016 IEEE DOI 10.1109/ICTAI.2016.138 913 2016 IEEE 28th International Conference on Tools with Artificial Intelligence 2375-0197/16 $31.00 © 2016 IEEE DOI 10.1109/ICTAI.2016.138 913