Bird-Species Audio Identification, Ensembling 1D +
2D Signals
Gyanendra Das
1
, Saksham Aggarwal
1
1
Indian Institue of Technology, Dhanbad, India
Abstract
In this paper, a method for recognizing bird species in audio recordings is described. we have experimented
with 4 diferent approaches. Model on Spectrograms and Waveform domain consists of two main models:
1) A binary classifer for predicting if bird call is present in the audio or not; 2) A multiclass classifer for
predicting which bird is present. Combining these two approaches, 1D and 2D signals, gives strong results.
We also experiment on ATDemucs which extends Demucs , replacing the BiLSTM with self-attention. In
this approach, we frst do source separation of multiple birds along with noise separation as Universal
Source Separation. Then we classify each source, both using a 1D waveform model ReSE-Multi, with
self-attention and a 2D spectrogram model. We also discuss how we handle diferent thresholds for
diferent models by a postprocessing technique. Ensembling techniques like Voting, Scaling and Direct
Averaging gave us a good boost in our results. Our combined architecture including 1D and 2D signals
achieves 0.6179 micro-averaged F1 in the task that asked for classifcation of 397 bird species.
Keywords
Deep Learning, Bird Species Classifcation, Transfer Learning, Attention Mechanism, Sound Detection,
Audio Source Detection, Demucs, Resnet 50, Efcient Net, Ensembling, Multi Domain Meta Training
1. Introduction
There are about 10,000 diferent bird species in this world, and they all play an important role
in the natural world. They serve as good indicators of declining habitat quality and pollution. It
is often easier to hear birds than it is to see them. BirdCLEF 2021[1] - Birdcall Identifcation is a
Kaggle competition organized by The Cornell Lab of Ornithology in collaboration with LifeCLEF
2021[1] whose challenge is to identify which birds are calling in long recordings, given training
data generated in meaningfully diferent contexts. This paper is structured in a way that it frst
gives details of the competition and the given data so that there is a clear understanding of the
challenges posed by the train and test data. Also, we will provide a detailed solution to the
approaches we have used for this challenge including data preparation, augmentations, model
building, training procedure, and post-processing techniques.
CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
gyanendralucky9337@gmail.com (G. Das); sakshamaggarwal20@gmail.com (S. Aggarwal)
https://luckygyana.github.io/Portfolio/ (G. Das); https://github.com/saksham20aggarwal (S. Aggarwal)
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)