Chinese White Dolphin Detection in the Wild Hao Zhang zhanghaoinf@gmail.com Department of Computer Science, City University of Hong Kong Hong Kong SAR, China Qi Zhang qzhang364-c@my.cityu.edu.hk Department of Computer Science, City University of Hong Kong Hong Kong SAR, China Phuong Anh Nguyen panguyen2@cityu.edu.hk Department of Computer Science, City University of Hong Kong Hong Kong SAR, China Victor Lee csvlee@eee.hku.hk Department of Electrical and Electronic Engineering, The University of Hong Kong Hong Kong SAR, China Antoni B. Chan abchan@cityu.edu.hk Department of Computer Science, City University of Hong Kong Hong Kong SAR, China ABSTRACT For ecological protection of the ocean, biologists usually conduct line-transect vessel surveys to measure sea species’ population den- sity within their habitat (such as dolphins). However, sea species observation via vessel surveys consumes a lot of manpower re- sources and is more challenging compared to observing common objects, due to the scarcity of the object in the wild, tiny-size of the objects, and similar-sized distracter objects (e.g., floating trash). To reduce the human experts’ workload and improve the obser- vation accuracy, in this paper, we develop a practical system to detect Chinese White Dolphins in the wild automatically. First, we construct a dataset named Dolphin-14k with more than 2.6k dolphin instances. To improve the dataset annotation efficiency caused by the rarity of dolphins, we design an interactive dolphin box annotation strategy to annotate sparse dolphin instances in long videos efficiently. Second, we compare the performance and efficiency of three off-the-shelf object detection algorithms, includ- ing Faster-RCNN, FCOS, and YoloV5, on the Dolphin-14k dataset and pick YoloV5 as the detector, where a new category (Distracter) is added to the model training to reject the false positives. Finally, we incorporate the dolphin detector into a system prototype, which detects dolphins in video frames at 100.99 FPS per GPU with high accuracy (i.e., 90.95 mAP@0.5). CCS CONCEPTS Computing methodologies Computer vision tasks; Scene understanding; Vision for robotics; Neural networks. KEYWORDS datasets, neural networks, dolphin detection, detection system Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. MMAsia ’21, December 1–3, 2021, Gold Coast, Australia © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8607-4/21/12. . . $15.00 https://doi.org/10.1145/3469877.3490574 ACM Reference Format: Hao Zhang, Qi Zhang, Phuong Anh Nguyen, Victor Lee, and Antoni B. Chan. 2021. Chinese White Dolphin Detection in the Wild . In ACM Multimedia Asia (MMAsia ’21), December 1–3, 2021, Gold Coast, Australia. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3469877.3490574 1 INTRODUCTION Large infrastructure constructions around the sea (airport, cross- sea bridges, land reclamation, etc.) may cause disturbance to the surrounding ecosystem. These disturbances - including noise, land reshaping, and increasing water traffics - affect the distribution and behavior of marine mammals (eg. dolphins) [12]. Therefore, researchers have conducted vessel surveys (see Fig. 1a) to study the impact of these construction disturbance on marine mammals. However, the line-transect survey methodology [2] requires much manpower, using 4 people in a group to take turns to observe the sea with binoculars. Each observing split lasts 15 minutes and requires two observers, one using binocular and one using unaided eyes, to cover an angle of 180°field of view in front of the vessel (see Fig. 1b). A survey trip typically requires 4-6 hours of non-stop observing, depending on the survey area. This is labor demanding work while the accuracy of the surveying results are not guaran- teed because human eyes can just detect motion in a 160° field of view, and they need to rest frequently in an observing period. For those reasons, we propose developing a marine mammals detection system to reinforce this surveying procedure to reduce human labor and improve the surveying results. This study focuses on detecting the Chinese White Dolphins (CWDs or dolphins in short). Different from common objects detection problem, detecting dolphins in the wild has several challenges: Scarcity. The dolphins are rarely witnessed by humans and only appears on the water surface for around 1-2 seconds each time. Small size. The recorded dolphins are of a tiny size (30 × 30 pixels in 1080p videos). Partially visible. Mostly, only partial body of the dolphins can be observed (see Figure 3). Distracter objects. Distant objects, such as waves, sun glare, debris, are visually similar to dolphins and should be distin- guished to reduce false alarm. These objects are regarded as distracter samples (or false positives).