Audio Engineering Society Convention Paper Presented at the 140 th Convention 2016 June 4–7, Paris, France This convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least two qualiﬁed anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been reproduced from the author’s advance manuscript without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. This paper is available in the AES E-Library (http://www.aes.org/e-lib), all rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Deep Neural Networks for Dynamic Range Compression in Mastering Applications Stylianos Ioannis Mimilakis 1 , Konstantinos Drossos 2 , Tuomas Virtanen 2 , and Gerald Schuller 1 1 Fraunhofer IDMT, Ilmenau, Germany 2 Audio Research Group, Dept. of Signal Processing, Tampere University of Technology, Tampere, Finland Correspondence should be addressed to Stylianos Ioannis Mimilakis (mis@idmt.fhg.de) ABSTRACT The process of audio mastering often, if not always, includes various audio signal processing techniques such as frequency equalisation and dynamic range compression. With respect to the genre and style of the audio content, the parameters of these techniques are controlled by a mastering engineer, in order to process the original audio material. This operation relies on musical and perceptually pleasing facets of the perceived acoustic characteristics, transmitted from the audio material under the mastering process. Modelling such dynamic operations, which involve adaptation regarding the audio content, becomes vital in automated applications since it signiﬁcantly affects the overall performance. In this work we present a system capable of modelling such behaviour focusing on the automatic dynamic range compression. It predicts frequency coefﬁcients which allow the dynamic range compression, via a trained deep neural network, and applies them to unmastered audio signal served as input. Both dynamic range compression and the prediction of the corresponding frequency coefﬁcients take place inside the time-frequency domain, using magnitude spectra acquired from a critical band ﬁlter bank, similar to human’s peripheral auditory system. Results from conducted listening tests, incorporating professional music producers and audio mastering engineers, demonstrate on average an equivalent performance compared to professionally mastered audio content. Improvements were also observed, when compared to relevant and commercial software. 1 Introduction Audio production often includes a ﬁnal stage of pro- cess which is placed just before the stage of replication and commercial distribution of the audio material. It is entitled mastering and involves a series of audio signal processing algorithms, aiming to provide an overall au- dio enhancement in order to link the professional audio with the hi-ﬁdelity / home-entertainment industries [1]. Mastering consists of two main signal processing meth- ods: i) equalisation of the frequency content, and ii) dynamic range control. These two operations require a considerable amount of parameters that have to be deﬁned and controlled, in order to process the audio signals. Main ambition of this processing is to aestheti- cally enhance perceived acoustic characteristics of the signals [2]. The selection and the adjustment of these parameters relies solely on a continuous interaction be- 9539