Adaptive Mapping of Sound Collections for Data-driven Musical Interfaces Gerard Roma CeReNeM University of Huddersfield g.roma@hud.ac.uk Owen Green CeReNeM University of Huddersfield o.green@hud.ac.uk P. A. Tremblay CeReNeM University of Huddersfield p.a.tremblay@hud.ac.uk ABSTRACT Descriptor spaces have become an ubiquitous interaction paradigm for music based on collections of audio samples. However, most systems rely on a small predefined set of descriptors, which the user is often required to understand and choose from. There is no guarantee that the chosen descriptors are relevant for a given collection. In addition, this method does not scale to longer samples that require higher-dimensional descriptions, which biases systems to- wards the use of short samples. In this paper we propose novel framework for automatic creation of interactive sound spaces from sound collections using feature learning and di- mensionality reduction. The framework is implemented as a software library using the SuperCollider language. We com- pare several algorithms and describe some example interfaces for interacting with the resulting spaces. Our experiments signal the potential of unsupervised algorithms for creating data-driven musical interfaces. Author Keywords Dimensionality reduction, feature learning, information vi- sualization CCS Concepts •Computing methodologies → Dimensionality reduc- tion and manifold learning; •Applied computing → Sound and music computing; 1. INTRODUCTION Interacting with collections of digital audio samples is nowa- days part of many music creation workflows. Samples can come from a diversity of sources: from loops crafted for dance music, to field recordings, personal improvisation, commercial music releases, or instrument samples. Often, a collection of samples will constitute the creative material for one or several compositions or performances, imprinting their specific sound. One such collection can be defined as a corpus. Within this paper we refer to a sound corpus as a collection of samples that could be united by as little as a practitioner putting them together with some musical intent. In practice, sound corpora are often made of samples that are similar in some way, or share a common origin. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Copyright remains with the author(s). NIME’19, June 3-6, 2019, Federal University of Rio Grande do Sul, Porto Alegre, Brazil. While there has been significant research on signal process- ing and machine learning for dealing with audio, particularly in fields such as music information retrieval, the repertoire of tools currently available to music makers for dealing with sound corpora is still limited, requiring manual annotation, bookkeeping and editing. A significant amount of research has followed the adap- tation of corpus-based concatenative synthesis and musical mosaicing to open-ended music creation systems [16, 1]. The interface is typically based in an interactive visualization of the corpus. These systems suffer from several limitations, partly inherited from the roots in realistic synthesis. First, they often rely on specific descriptors (e.g. pitch, spectral centroid), typically requiring the user to choose two or three of them from a pre-defined set. This requires an understanding of concepts related with signal processing and psychoacoustics. Moreover, there is no assurance that a given sound corpus will have an interesting variation along a given set of descriptors. For example, a pitch descriptor may be irrelevant for a corpus obtained from environmental sounds. In general, a corpus may have its own sonic dimensions beyond a particular offer of descriptors. Second, such descriptors are typically obtained from a frame-level representation, which means they may vary sig- nificantly over time. A single value (typically the average over a sequence of frames) may be relevant for very short sound, but it will not be as useful for longer samples. De- scribing a longer sound (e.g. in the order of a few seconds) typically involves computing several statistics or other ways of describing a trajectory, which increases the number of parameters needed to represent each sound. As an attempt towards solving both these issues, in this paper we propose to automate the analysis so that the gener- ation of the interaction space is driven by the corpus rather than by pre-existing assumptions about pitch or timbre. The proposed framework allows learning both the base short-term features, and the mapping of high-dimensional summaries of sounds to a low-dimensional space that can be used in interactive applications. The framework is implemented as a library in the SuperCollider language. The rest of the paper is organized as follows. In the next section we briefly review existing work related with the use of feature spaces for interactive applications. We then review dimensionality reduction algorithms that have been previously applied to sounds. In Section 3, we describe the proposed method for automatically generating feature spaces from sound corpora. In Section 4, we compare sev- eral dimensionality reduction techniques and two different base features in a visualization experiment and a subjective reflective practice experiment. In Section 5, we describe several interfaces for interacting with the generated spaces, and in Section 6 we draw some conclusions. 313