Fire detection on unconstrained videos using color-aware spatial modeling and motion flow Letricia P. S. Avalhais, Jose Rodrigues-Jr., Agma J. M. Traina Institute of Mathematics and Computer Science, University of Sao Paulo Sao Carlos, Brazil {letricia, junio, agma}@icmc.usp.br Abstract—The semantic segmentation of events on emergency contexts involves the identification of previously defined events of interest. In this work, the focused semantic event is the presence of fire in videos. The literature presents several methods for automatic video fire detection, but these methods were built under assumptions, such as stationary cameras and controlled lightening conditions that are often in contrast to the videos acquired by hand-held devices. To fulfill this gap, we propose a fire detection method, called SPATFIRE. Our method innovates on three aspects: (1) it relies on a specifically tailored color model named Fire-like Pixel Detector able to improve the accuracy of fire detection; (2) it employs a new technique for motion com- pensation, diminishing the problems observed in videos captured with non-stationary cameras; and, (3) it defines a segmentation method able to identify, not only the presence of fire in a video, but also the segments in the video where fire occurs. We experimented our proposal on two video datasets with different characteristics and summarize the results to demonstrate the superior efficacy, in terms of true positives and negatives, as compared to state-of-the-art methods. Keywords-Event recognition; video fire detection; spatial seg- mentation; temporal flow I. I NTRODUCTION Mobile devices and streaming services have answered for a huge increase in the amount of information produced as videos. By means of surveillance, such information carries potential for decision-making and security in several domains. However, the examination of such videos relying on human effort is time-consuming and exhaustive. These facts have led to an increasing pursuit of intelligent systems able to manage video content, as well as efforts that lead to advances on video analysis and multimedia retrieval systems. One of the intensively studied branches of video analysis is the automatic identification of specific events of interest. This task is used to support several activities as automatic tagging, indexing, and searching over multimedia information. Also, surveillance and crisis management systems can benefit from event detection aimed at recognizing anomalous behavior or specific target events, applications where many types of research have been conducted [1], [2]. In this work, we focus on the topic of specific events detection, aiming at the identification of fire. Fire detectors based on video analysis have several advantages over still fire sensors. A video camera can cover a much wider area than a single sensor, and can provide valuable information; e.g., the dimension of the incident, the growth rate of the fire, and the potential risk for a given scenario [3]. Our research is part of a collaboration with a larger project 1 , which is developing an emergency system that uses crowdsourcing images and videos, sent by mobile devices, to support the decision making during emergency situations. In the context of our project, an emergency situation in a crowded environment may start to receive a volume of data that can become impractical for the specialists to analyze. Thus, the crisis monitoring system has to efficiently process the incoming data identifying the relevant information that can allow the specialists to take strategic decisions. For this reason, our work was developed to cope with real-time applications that have the execution time as a challenging constraint. The most salient visual feature of fire is color, which is used in several related methods. The yellow-reddish appearance of fire is generally captured by color models in the spatial domain [4]. Notwithstanding, methods that use only the spatial color information are more prone to a high rate of false alarms. This is because of the ambiguity with non-fire objects with the same visual appearance. Dynamic textures [5], in this context, have potential to capture other relevant cues. In terms of spatial detection, regions of interest (ROIs) of fire can also be segmented by taking advantage of wavelet transforms in addition to color, including direction patches [6] or salient region descriptors [7]. As observed by Phillips et al. [8], the motion nature of fire can be the distinguishing key to leveraging the fire detection. Indeed, many works based on the combination of the static visual information with the temporal content, reveal better performance than the methods based on color only [9], [10]. It is important to highlight that, in general, the related works tackle the fire detection problem from videos captured by stationary cameras, or from videos with very few influence of camera motion. This assumption does not fit the requirements of a crowdsourcing emergency system, since videos shot using hand-held mobile devices, especially under a crisis situation, are very likely to have abrupt camera motion, blur, and high luminosity variance. We incorporate such issues in our methodology and, for evaluation purposes, we used two datasets: one consisting of videos collected from the web, and 1 Project FP7-ICT-2013-EU-Brazil - “RESCUER - Reliable and Smart Crowdsourcing Solution for Emergency and Crisis Management” 2016 IEEE 28th International Conference on Tools with Artificial Intelligence 2375-0197/16 $31.00 © 2016 IEEE DOI 10.1109/ICTAI.2016.138 912 2016 IEEE 28th International Conference on Tools with Artificial Intelligence 2375-0197/16 $31.00 © 2016 IEEE DOI 10.1109/ICTAI.2016.138 913 2016 IEEE 28th International Conference on Tools with Artificial Intelligence 2375-0197/16 $31.00 © 2016 IEEE DOI 10.1109/ICTAI.2016.138 913