QUANTIFYING MASKING IN MULTI-TRACK RECORDINGS Sebastian Vega Universitat Pompeu Fabra svegalopez@gmail.com Jordi Janer Universitat Pompeu Fabra jordi.janer@upf.edu Third author Afﬁliation3 author3@smcnetwork.org ABSTRACT It is known that one of the most important tasks in mu- sic post-production is equalization. Equalization can be applied in several ways, but one of the main purposes it serves is masking minimization. This is done so that the listener can appreciate the timbral qualities of all instru- ments within a musical mix. However, the study of mask- ing between the different instruments of a multi-track mix has not received a lot of attention, and a quantitative mea- sure based on perceptual studies has not yet been proposed. This paper presents such a measure, along with a study of masking between several common instruments. The mea- sure proposed (cross-adaptive signal-to-masker ratio) is in- tended to serve as an analysis tool to be used by audio engi- neers when trying to combat masking using their preferred equalization techniques. 1. INTRODUCTION Computers are being used to perform complex tasks that are inherently human and with a high degree of accuracy. Some examples of this are speech recognition [1] and au- tomatic musical genre classiﬁcation [2]. Recently, the pos- sibility of a computer being able to down-mix a multi-track recording like an audio engineer is starting to be explored [3,4,5], and although the use of perceptual models has not been exploited for this purpose, it is the authors opinion that using computational models of perception might prove useful, if not indispensable, to achieve good results. The underlying complexities and non-linearities involved in the process of down-mixing are numerous, mainly because multi- track down mixing is a task where both technology and cre- ativity co-exist in equal proportions so an exact set of rules for mixing does not really exist. In order for automatic down-mixing to become feasible many individual issues need to be tackled, for example, automatic panning, auto- matic dynamics control and automatic equalization. This work is a step towards the latter. 1.1 Masking within a musical context When mixing a song, most of the decisions of an audio engineer are inﬂuenced by context; the genre of the mu- sic, the intended audience and for all we know even the Copyright: c 2010 Sebastian Vega et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 Unported , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. weather at the time of mixing can inﬂuence the engineer. Nevertheless there is one aspect of a mix that is crucial and for which most audio engineers share the same view. This is masking. Masking has been deﬁned as the process by which the threshold of audibility for one sound is raised by the presence of another (masking) sound. Also, as the amount by which the threshold of audibility of a sound is raised by the presence of another (masking) sound. The unit customarily used is the decibel [6]. However, when talking in terms of a musical mix, the term attains a quite different deﬁnition, namely; when one signal competes with another, reducing the hearing sys- tems ability to fully hear the desired signal, masking has occurred [7]. The latter deﬁnition emphasizes the fact that in a multi-track mix several instruments are ﬁghting to be heard, so the ability to fully hear every individual instru- ment is reduced. Accordingly, when there is a lot of mask- ing going on between the different tracks of a multi-track recording, the resulting mix is cloudy and confusing. On the other hand, an unmasked mix is the one where all the instruments are clearly deﬁned thus allowing the listener to fully appreciate their timbral characteristics. As such, it is the authors opinion that masking minimization should be one of the pillars of an automatic mixing system. Luckily, masking minimization is not a mystery; it is known that audio engineers employ three weapons when combating masking: setting the relative levels of tracks appropriately (thus giving priority to some of them), pan- ning the tracks with similar content to opposite sides of the stereo panorama, and equalization to ensure that each track is allocated a portion of the spectrum. 1.2 Motivation behind this work It is clear that there is a lot of ground to be covered in terms of studying the factors that inﬂuence the engineers deci- sions when performing the three steps mentioned above. For example, if an audio engineer is using the techniques to combat masking, an analysis tool that is able to quan- tify masking would come very useful, this analysis tool could also be useful for automating the process of mask- ing minimization. To the authors knowledge, a measure of masking based on perceptual studies has not yet been pro- posed within the context of a mix. It is then the intention of this work to provide a meaningful quantitative measure of masking between the different tracks of a multi-track recording. The proposed measure is based on the widely accepted power spectrum model of masking [8]. The next