Proceedings of the 2 nd AES Workshop on Intelligent Music Production, London, UK, 13 September 2016 THE OPEN MULTITRACK TESTBED: FEATURES, CONTENT AND USE CASES Brecht De Man and Joshua D. Reiss Centre for Digital Music Queen Mary University of London {b.deman,joshua.reiss}@qmul.ac.uk ABSTRACT The Open Multitrack Testbed is an online repository of mul- titrack audio accessible to the public, with rich metadata an- notation, a semantic database and search functionality. Two years after it ﬁrst went live, the dataset is the largest and most diverse available, and still growing. An overview of the available content, some prominent features, and exam- ple uses in the ﬁeld of intelligent music production are dis- cussed. 1. INTRODUCTION A large part of music production research is concerned with the analysis and manipulation of multitrack audio. As a con- sequence, there is a need for a large number of multitrack recordings for investigating recording and mixing practices, evaluating algorithms, and demonstrating new ideas. How- ever, multitrack content is scarce, in part due to licensing is- sues. To address this, we have created the Open Multitrack Testbed [1], a collection of annotated multitracks with an as- sociated website (multitrack.eecs.qmul.ac.uk). In this context, a multitrack audio item, or song, is de- ﬁned as a set of more than two streams (or tracks) of audio which are meant to be played alongside each other. In ad- dition to these tracks, some songs also contain mixes (pro- cessed sums of the raw tracks) and stems (processed sums of a subset of these tracks, e.g. only the drum parts). 2. FEATURES To quickly ﬁnd suitable content, the web application in- cludes browse and search functionality (Figures 1 and 2), to allow ﬁltering and searching using the various metadata properties. The metadata associated with different songs, stems, mixes and tracks (Figure 3) is visualised within the application, and each item can be downloaded separately. The database offers a SPARQL endpoint to query and insert data through HTTP requests. The infrastructure fur- ther supports user accounts and different levels of access, for instance when licenses are less liberal, and a convenient metadata input interface. 3. CONTENT Launched in 2014, the Testbed’s initial collection was taken from an internal dataset of multitrack audio content at the Centre for Digital Music, and it is still being continually expanded with locally and remotely hosted content. At the Figure 1: Browse interface screenshot time of writing, it contains close to 600 songs, of which some have up to 300 individual constituent tracks from sev- eral takes, and others up to 400 mixes of the same source content. A wide range of metadata is supported, and included to the extent that it is available for the different items. Us- ing established knowledge representation methods such as the Music Ontology [2] and the Studio Ontology [3], song attributes include title, artist, license, composer, and record- ing location; track attributes include instrument, microphone, sampling rate, number of channels, and take number; and mix attributes include mixing engineer, audio render for- mat, and digital audio workstation (DAW) name and ver- sion. These properties can be used to search, ﬁlter and browse the content to ﬁnd the desired audio. 4. USE CASES With a dataset of this size and diversity, and such a wide range of metadata available, the testbed can be and has been used for various research topics including audio analysis [4], training and testing machine learning models [5] and analysis of music production practices [6]. A number of other multitrack audio resources exist, but they contain a smaller number of items, are less diverse, have ambiguous or restricted licensing, and/or provide lit- tle or no metadata. Furthermore, the Testbed uniquely has a number of songs with several mixes including DAW ﬁles containing all parameter settings [7]. Where licensing al- lows it, the resources are mirrored within the Testbed. For unclear or less liberal licenses, the metadata is still added to the database, but links point to third party websites. Researchers, journals, conferences and funding bodies increasingly prefer data to be open, as it allows reproduction and extension of results. The Testbed facilitates widespread usage of a single, but large and diverse dataset, allowing for Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).