© 2019 Shaleen Bengani, S. Vadivel and J. Angel Arul Jothi. This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license. Journal of Computer Science Original Research Paper Efficient Music Auto-Tagging with Convolutional Neural Networks Shaleen Bengani, S. Vadivel and J. Angel Arul Jothi Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai, UAE Article history Received: 27-04-2019 Revised: 15-07-2019 Accepted: 23-08-2019 Corresponding Authors: S. Vadivel Department of Computer Science, Birla Institute of Technology and Science Pilani Dubai Campus, Dubai, UAE Email: vadivel@dubai.bits-pilani.ac.in Abstract: Technology is revolutionizing the way in which music is distributed and consumed. As a result, millions of songs are instantly available to millions of people, on the Internet. This has created the need for novel music search and discovery services. Music is often searched using descriptive keywords, or tags, based on the content of the song. Hence, one very important task in achieving a great music search engine is automatic tagging of music. Currently, deep learning techniques using convolutional neural networks produce state- of- the-art results for this task. Several deep learning algorithms are able to achieve good results but at the cost of efficiency. As neural networks get deeper, the cost of computation grows exponentially. In this paper, we present a deep learning-based ensemble method that achieves near state-of-the-art performance on the music auto-tagging task. Our method is significantly more efficient in terms of computation time and disk space. This opens up the option of using our proposed model directly on a mobile device. Keywords: Deep Learning, Convolutional Neural Network, Music Auto-Tagging, Mel Spectrogram Introduction About 15 years ago, most people obtained the music they listened to from music CDs. With a majority of the world now having internet, this has changed. The advent of music streaming services such as Spotify, Apple Music and Pandora, people have instant access to millions of songs. Spotify, the most popular music streaming service in the world, for instance, has over 40 million songs and more than 180 million users. Apple Music and Pandora, too, have about the same number of songs in their library. So, these companies often compete on features such as music discovery and recommendation. Music is often described using special keywords called tags. Tags convey information such as instruments (e.g., piano, guitar, drums), genre (e.g., rock, pop, classical, indie), mood (e.g., happy, sad, upbeat). Manually adding tags to millions of songs can be very expensive and time consuming. This has made automatic tagging of music a very important task in the field of music information retrieval. Music auto-tagging is a classification problem of predicting music tags using audio signals. As deep learning was not that popular in 2011, majority of the papers surveyed involved a feature extractor with manually designed features followed by a classifier. This also required the researchers to have a fair amount of domain-level knowledge to understand what kind of features could satisfactorily describe acoustic properties (Fu et al., 2011). Whereas in recent years, deep neural networks have been successfully used to annotate music. However, deep learning models are more often than not computationally very expensive and occupy a lot of disk space. Hence, they cannot exist on mobile device. The following subsection details the related work in music auto-tagging. Related Work Convolutional Neural Networks (CNNs) perform really well on pattern recognition tasks because of their ability to learn spatially invariant features. Naturally, several researchers have also used CNNs for music auto tagging. Dieleman and Schrauwen (2014) used a two layer CNN with Mel spectrogram as well as raw audio as input. They experimented with various convolutional kernel sizes