Discovering and Generating Hard Examples for Training a Red Tide Detector Hyungtae Lee ⋆† Heesung Kwon ⋆ Wonkook Kim ‡ ⋆ US Army Research Lab † Booz Allen Hamilton ‡ Pusan National University Abstract Currently, accurate detection of natural phenomena, such as red tide, that adversely affect wildlife and human, using satellite images has been increasingly utilized. How- ever, red tide detection on satellite images still remains a very hard task due to unpredictable nature of red tide oc- currence, extreme sparsity of red tide samples, difficulties in accurate groundtruthing, etc. In this paper, we aim to tackle both the data sparsity and groundtruthing issues by primar- ily addressing two challenges: i) significant lack of hard examples of non-red tide that can enhance detection perfor- mance and ii) extreme data imbalance between red tide and non-red tide examples. In the proposed work, we devise a 9- layer fully convolutional network jointly optimized with two plug-in modules tailored to overcoming the two challenges: i) a hard negative example generator (HNG) to supplement the hard negative (non-red tide) examples and ii) cascaded online hard example mining (cOHEM) to ease the data im- balance. Our proposed network jointly trained with HNG and cOHEM provides state-of-the-art red tide detection ac- curacy on GOCI satellite images. 1. Introduction Accurate and timely detection of short-term and long- term variations in naturally occurring phenomena (e.g. red tide, sea fog, yellow dust, etc.) that adversely affect wildlife as well as humans is highly critical. For instance, red tide is a toxic microscopic organism that inflicts serious dam- ages to not only near-shore fishery but also large marine ecosystems in general. To investigate how, when, where these harmful natural phenomena occur and spread, many countries launched geostationary satellites closely observ- ing areas of interest surrounding their territory. Accord- ingly, there have been a number of attempts to detect the harmful natural phenomena by analyzing remotely sensed images [8, 12, 30, 32, 36, 42] from the geostationary satel- lites. In this paper, we propose a convolutional neural net- work (CNN)-based approach that can detect red tide em- bedded into a large scale image dataset. Figure 1: Red Tide Examples Shown on GOCI Images. In the above figure, red tide appears as elongated red bands. The images are a false color image by combining the 6th, 4th and 1st band of the GOCI multi-spectral image representing the red, green and blue colors, respectively. To develop a CNN-based red tide detection approach, we have used the large-scale multi-spectral image dataset ob- tained from GOCI (Geostationary Ocean Color Imager) [5] on a geostationary satellite. Several red tide examples on GOCI multi-spectral images are shown in Figure 1. Since the characteristics of biological properties of red tide do not clearly appear in the image, we used the information on real-world red tide occurrences reported by NIFS (National Institute of Fisheries Science) [1] of South Korea. How- ever, NIFS manually examined red tide occurrence only at a limited number of locations along the southern seashore of South Korea, certainly not being able to cover the entire area infested by red tide. Therefore, in training, we end up with having only a small number of spectral samples from a fraction of areas where red tide actually occur. In our work, we use the images taken in December where red tide do not occur due to the low water temperature as negative examples 1 . Figure 2 shows the GOCI images used for the positive (red tide) and negative (non-red tide) training ex- amples, and the red tide region annotation of the positive image. There are two challenges to use GOCI images and their ground truth labels for training the red tide detection. First, the spectral characteristics of the images taken in Decem- ber are very different from those of the images taken in the 1 In South Korea, summer is in July and August and winter in Decem- ber. Red tide occurs mainly in summer when the water temperature is high. 1 arXiv:1812.05447v2 [cs.CV] 9 Apr 2019