© 2019 Shaleen Bengani, S. Vadivel and J. Angel Arul Jothi. This open access article is distributed under a Creative
Commons Attribution (CC-BY) 3.0 license.
Journal of Computer Science
Original Research Paper
Efficient Music Auto-Tagging with Convolutional Neural
Networks
Shaleen Bengani, S. Vadivel and J. Angel Arul Jothi
Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai, UAE
Article history
Received: 27-04-2019
Revised: 15-07-2019
Accepted: 23-08-2019
Corresponding Authors:
S. Vadivel
Department of Computer
Science, Birla Institute of
Technology and Science Pilani
Dubai Campus, Dubai, UAE
Email: vadivel@dubai.bits-pilani.ac.in
Abstract: Technology is revolutionizing the way in which music is
distributed and consumed. As a result, millions of songs are instantly
available to millions of people, on the Internet. This has created the
need for novel music search and discovery services. Music is often
searched using descriptive keywords, or tags, based on the content of
the song. Hence, one very important task in achieving a great music
search engine is automatic tagging of music. Currently, deep learning
techniques using convolutional neural networks produce state- of-
the-art results for this task. Several deep learning algorithms are able
to achieve good results but at the cost of efficiency. As neural
networks get deeper, the cost of computation grows exponentially. In
this paper, we present a deep learning-based ensemble method that
achieves near state-of-the-art performance on the music auto-tagging
task. Our method is significantly more efficient in terms of
computation time and disk space. This opens up the option of using
our proposed model directly on a mobile device.
Keywords: Deep Learning, Convolutional Neural Network, Music
Auto-Tagging, Mel Spectrogram
Introduction
About 15 years ago, most people obtained the music
they listened to from music CDs. With a majority of the
world now having internet, this has changed. The advent
of music streaming services such as Spotify, Apple Music
and Pandora, people have instant access to millions of
songs. Spotify, the most popular music streaming service
in the world, for instance, has over 40 million songs and
more than 180 million users. Apple Music and Pandora,
too, have about the same number of songs in their
library. So, these companies often compete on features
such as music discovery and recommendation.
Music is often described using special keywords called
tags. Tags convey information such as instruments (e.g.,
piano, guitar, drums), genre (e.g., rock, pop, classical,
indie), mood (e.g., happy, sad, upbeat). Manually adding
tags to millions of songs can be very expensive and time
consuming. This has made automatic tagging of music a
very important task in the field of music information
retrieval. Music auto-tagging is a classification problem
of predicting music tags using audio signals.
As deep learning was not that popular in 2011,
majority of the papers surveyed involved a feature
extractor with manually designed features followed by a
classifier. This also required the researchers to have a
fair amount of domain-level knowledge to understand
what kind of features could satisfactorily describe
acoustic properties (Fu et al., 2011). Whereas in recent
years, deep neural networks have been successfully used
to annotate music. However, deep learning models are
more often than not computationally very expensive and
occupy a lot of disk space. Hence, they cannot exist on
mobile device. The following subsection details the
related work in music auto-tagging.
Related Work
Convolutional Neural Networks (CNNs) perform really
well on pattern recognition tasks because of their ability to
learn spatially invariant features. Naturally, several
researchers have also used CNNs for music auto tagging.
Dieleman and Schrauwen (2014) used a two layer CNN
with Mel spectrogram as well as raw audio as input. They
experimented with various convolutional kernel sizes