Oil spill feature selection and classiﬁcation using decision tree forest on SAR image data Konstantinos Topouzelis a,⇑ , Apostolos Psyllos b a University of the Aegean, Department of Marine Sciences, University Hill, 81100 Mytilene, Greece b European Commission Joint Research Centre, Institute for the Protection and Security of the Citizen, Italy article info Article history: Received 15 September 2010 Received in revised form 26 October 2011 Accepted 22 January 2012 Available online 28 February 2012 Keywords: Oil spill Decision forest Feature selection SAR Classiﬁcation Machine learning abstract A novel oil spill feature selection and classiﬁcation technique is presented, based on a forest of decision trees. The parameters of the two-class classiﬁcation problem of oil spills and look-alikes are explored. The contribution to the ﬁnal classiﬁcation of the 25 most commonly used features in the scientiﬁc community was examined. The work is sought in the framework of a multi-objective problem, i.e. the minimization of the used input features and, at the same time, the maximization of the overall testing classiﬁcation accuracy. Results showed that the optimum forest contains 70 trees and the three most important com- binations contain 4, 6 and 9 features. The latter feature combination can be seen as the most appropriate solution of the decision forest study. Examination of the robustness of the above result showed that the proposed combination achieved higher classiﬁcation accuracy than other well-known statistical separa- tion indexes. Moreover, comparisons with previous ﬁndings converge on the classiﬁcation accuracy (up to 84.5%) and to the number of selected features, but diverge on the actual features. This observation leads to the conclusion that there is not a single optimum feature combination; several sets of combina- tions exist which contain at least some critical features. Ó 2012 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved. 1. Introduction Synthetic Aperture Radar (SAR) images are extensively used for the detection of oil spills in the marine environment, as they are independent of sun light and not affected by cloudiness. Radar backscatter values from oil spills are very similar to backscatter values from very calm sea areas and other ocean phenomena, named look-alikes (e.g. currents, eddies, upwelling or downwelling zones, fronts and rain cells). Several studies aiming at oil spill detection have been conducted (Brekke and Solberg, 2005; Del Frate et al., 2000; Fiscella et al., 2000; Karathanassi et al., 2006; Migliaccio and Trangaglia, 2004; Pavlakis et al., 2001; Stathakis et al., 2006; Topouzelis et al., 2003, 2009). A detailed introduction to oil spill detection by satellite remote sensing is given by Brekke and Solberg (2005), while a detailed comparison on the several approaches and their characteristics is given by Topouzelis (2008). Oil spill detection methodology can be summarized in four steps. First, all dark signatures present in the image are isolated. Second, features for each dark signature are extracted. Third, these features are tested against predeﬁned values. Finally, probabilities for each candidate signature are computed to determine whether it is an oil spill, or a look-alike phenomenon. Researchers have used different input features for oil spill classiﬁcation in their studies. Several studies indicate this notice. Fiscella et al. (2000) used 14 features, Solberg and Theophilopoulos (1997) used 15 features, Solberg et al. (1999) used 11 features, many of which were different from the previous studies and in general different from the 11 features used by Del Frate et al. (2000). A general description about the calculated features is given by Espedal and Johannessen (2000), in which texture features are introduced for the ﬁrst time. Moreover, Keramitsoglou et al. (2005) refer to 14 features and Karathanassi et al. (2006) use 13 features covering physical, geometrical and textural behavior. Several studies try to unify all the features used having similar characteristics (e.g. Brekke and Solberg, 2005; Migliaccio and Trangaglia, 2004; Montali et al., 2006). The absence of a systematic research on the extracted features as well as their contribution to the classiﬁcation results, forces researchers to arbitrarily select features as inputs to their systems. Previous research (Stathakis et al., 2006; Topouzelis et al., 2009) headed, for the ﬁrst time, on this direction. Those studies used a combination of genetic algorithms and neural networks. The lack of the systematic research is attributed to the fact that the existing 0924-2716/$ - see front matter Ó 2012 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved. doi:10.1016/j.isprsjprs.2012.01.005 ⇑ Corresponding author. Tel.: +30 2251036878. E-mail address: topouzelis@marine.aegean.gr (K. Topouzelis). ISPRS Journal of Photogrammetry and Remote Sensing 68 (2012) 135–143 Contents lists available at SciVerse ScienceDirect ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs