SYNTHBOT: AN UNSUPERVISED SOFTWARE SYNTHESIZER PROGRAMMER Matthew Yee-King Informatics University of Sussex Martin Roth mhroth@gmail.com ABSTRACT This work presents a software synthesizer programmer, SynthBot, which is able to automatically find the settings necessary to produce a sound similar to a given target. As modern synthesizers become more capable and the un- derlying synthesis architectures more obscure, the task of programming them to produce a desired sound becomes more time consuming and complex. SynthBot is presented as an automated solution to this problem. A stochastic search algorithm, in this case a genetic algorithm, is used to find the parameters which produce the most similar sound to the target. Similarity is measured by the sum squared error between the Mel Frequency Cepstrum Co- efficients (MFCCs) of the target and candidate sounds. The system is evaluated technically to establish its abil- ity to effectively search the space of possible parameter settings. A pilot study is then described where musicians compete with SynthBot to see who is the most competent synthesizer programmer, where each competitor rates the other using their own metrics of sound similarity. The out- come of these tests suggest that the system is an effective ”composer’s assistant”. 1. INTRODUCTION As the performance of general purpose computer hard- ware continues to improve, there is also an associated in- crease in the complexity of software synthesizers. For example, the current version of Native Instruments’ FM8 synthesizer has 1093 parameters [8]. Even with an opti- mised user interface, creating sounds that are more than mild adjustments of the presets can be a challenge. In this paper we present SynthBot, a system which is capable of automatically programming any VSTi compatible soft- ware synthesizer in order to produce a sound as close as possible to a target sound supplied by the user. SynthBot uses a genetic algorithm to search the space of possible parameter settings for any given VST synthe- sizer plugin [12], guided by a fitness function which com- pares Mel Frequency Cepstrum Coefficients (MFCC) [5] features of the target sound and the candidate sounds gen- erated by the plugin using these parameter settings. As the candidate parameters evolve, the corresponding syn- thesized sounds move closer to the target sound and the feature vector error is reduced. When the system finds pa- rameter settings which produce a sound that satisfies the user, this sound can be saved as a preset which is available to any other VSTi host software. The user is able to sim- ply produce completely new programmes for any VSTi synthesizer, tailored to their own specification. SynthBot is implemented as a cross-platform Java ap- plication which uses the Java Native Interface (JNI) to provide a host for the VST plugins and for optimised fea- ture extraction. The system has so far been tested on the Mac OS X and GNU/ Linux platforms. The Java language was chosen to allow rapid cross platform development, es- pecially for GUI and threading functionality. At the time of writing, there is not a single compara- ble general purpose, interoperable, and unsupervised syn- thesizer programmer system available. However, the key techniques - timbre similarity measurement using MFCC features and non-linear parameter optimisation using a ge- netic algorithm are well established. In the remainder of this paper, related research is dis- cussed, the technical implementation is described, a tech- nical evaluation is presented along with the initial results of a pilot user evaluation, and finally there is a conclusion and discussion of future plans. 1.1. Related work There has been a steady interest in the application of un- supervised genetic algorithms to the problem of automatic synthesizer programming. An early example is [7], where tone matching is achieved using FM synthesis and a ge- netic algorithm. FM synthesis combined with GA pro- grammers appears repeatedly in the literature (e.g. [16, 1, 14], but other synthesis algorithms have been tried, e.g. noise band synthesis [4] and subtractive synthesis [15]. A key question in all these systems is how to judge sound similarity. Most opt for an error measure obtained by comparing the power spectra of the candidate and tar- get sounds, an approach which does indeed reward sim- ilar sounds. One problem with the power spectrum is its brittleness - if the same instrument plays two differ- ently pitched notes, there will be a large error between the power spectra even if the notes could clearly be identified by a human user as having been played using the same instrument - human perception must be considered. This is addressed in some of the research, where perceptually informed measures such as the spectral centroid are used (e.g. [16]).