Audio Engineering Society Convention Paper 8588 Presented at the 132nd Convention 2012 April 26–29 Budapest, Hungary This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Implementation and Evaluation of Autonomous Multi-track Fader Control Stuart Mansbridge 1 , Saoirse Finn 1 and Joshua D. Reiss 1 1 Centre for Digital Music, Queen Mary University of London, Mile End Road, London, E1 4NS, UK stuart.mansbridge@eecs.qmul.ac.uk saoirse.finn@eecs.qmul.ac.uk josh.reiss@eecs.qmul.ac.uk ABSTRACT A new approach to the autonomous control of faders for multi-track audio mixing is presented. The algorithm is designed to generate an automatic sound mix from an arbitrary number of monaural or stereo audio tracks of any sample rate, and to be suitable for both live and post-production use. Mixing levels are determined by the use of the EBU R-128 loudness measure, with a cross-adaptive process to bring each track to a time-varying average. A hysteresis loudness gate and selective smoothing prevents the adjustment of intentional dynamics in the music. Real- time and off-line software implementations have been created. Subjective evaluation is provided in the form of listening tests, where the method is compared against the results of a human mix and a previous automatic fader implementation. 1. INTRODUCTION Producing a balanced audio mixture from multi-track content requires the considered choice of fader levels. Previous proposals for the automation of this procedure suggest either a machine-learning method [1][2], or the extraction of perceptual attributes (specifically loudness) to allow the emulation of real-time decisions made by a sound engineer [3]. This paper provides a description, analysis and evaluation of a new flexible implementation of the latter approach. The basis of the method is to achieve optimal inter-channel intelligibility, with the assumption that this is achieved by the adjustment of all inputs to a dynamic average perceptual loudness. The focus throughout this paper is on the development of a new detailed, versatile and reliable real-time, low latency algorithm. The chosen method for loudness estimation is from the EBU R-128 recommendation [4], which calculates a loudness value of unit LUFS (equivalent to dBFS) by a mean-square energy calculation over a frame of audio samples, using two bi-quadratic IIR filters to provide a