TEACH YOURSELF GEORGIAN FOLK SONGS DATASET: AN ANNOTATED CORPUS OF TRADITIONAL VOCAL POLYPHONY David Gillman New College of Florida dgillman@ncf.edu Uday Goyat Georgia Institute of Technology ugoyat3@gatech.edu Atalay Kutlay New College of Florida atalay.kutlay18@ncf.edu ABSTRACT New datasets of non-Western traditional music contribute to the development of knowledge in MIR and allow com- putational techniques to inform ethnomusicology. We present an annotated dataset of traditional vocal polyphony from two regions of the Republic of Georgia with disparate musical characteristics. The audio for each song consists of four polyphonic recordings of one performance from different microphones. We present a process and workfow that we use to annotate the dataset, which takes advantage of the salience of individual voices in each recording. The process results in an f 0 estimate for each vocal part. 1. INTRODUCTION To evaluate algorithms in Music Information Retrieval (MIR) it is essential to have a variety of extensive datasets of annotated music [1]. Annotated datasets of vocal a cap- pella music have been scarce until recently, particularly of non-Western music. The work of [2] provides a list of datasets of vocal polyphony that includes two datasets of songs from the Republic of Georgia: the Erkomaishvili dataset of [3] and the collection recorded in the work of [4]. Multi-f 0 estimation is a sub-problem of Automated Music Transcription (AMT) consisting of identifying the fundamental frequency f 0 of each part in a polyphonic recording. Datasets of polyphony that are annotated with the fundamental frequency of each part are useful as ground truth for multi-f 0 algorithms. Multi-f 0 estima- tion is a challenging problem for a cappella vocal music because of the variety of sounds produced by the human voice and the similarity in timbre of different voices [5]. In this work we present an annotated dataset of 38 three-part songs from the Republic of Georgia, including 29 from the region of Guria and nine from the region of Samegrelo. The total duration of the collection is 89 min- utes. Gurian songs are a particular challenge for multi-f 0 estimation. The three parts are independent melodic lines that often cross and contain rapid movement. The top part often consists of krimanchuli, Georgian yodeling, with as © D. Gillman, U. Goyat, and A. Kutlay. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: D. Gillman, U. Goyat, and A. Kutlay, ªTeach Yourself Georgian Folk Songs Dataset: An Annotated Corpus Of Traditional Vocal Polyphonyº, in Proc. of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022. many as six changes of octave per second. These features make our dataset a useful contribution as part of a training set for multi-f 0 algorithms. We have created a web-based visualization of the dataset that is potentially useful as an aid to singers learning their parts on Georgian songs, which was the purpose of the original recordings. We also present a new process and workfow for multi- f 0 estimation where several recordings exist of a single performance. Our dataset is an unusual challenge for an- notation in that it does not include isolated tracks for each vocal part. Instead there are four recordings of one perfor- mance made from different microphones. One recording presents a balanced mix of voices. In each of the other three recordings one of the voices is more salient than the other two, but all three voices are easily audible and create a polyphonic mixture. Our process and workfow consist of isolating the salient voice, applying several algorithms for monophonic f 0 estimation, and using a graphical in- terface to select the correct estimate. In addition to f 0 es- timates we present the median absolute deviation of the estimates, a measure of confdence in the estimates. Other interactive methods have appeared for extract- ing melody from audio. The work of [6] introduced the Tony software for monophonic audio. It presents to the user several pitch estimates generated by an early stage of the pYin algorithm, from which the user can select ranges of time and frequency. It is designed for ease of use and has many features such as octave correction [7]. The sys- tem of [8] designed for the Erkomaishvili dataset presents to the user a spectrogram with one melody highlighted as a result of dynamic programming performed on a set of salient frequencies at each time step, followed by auto- matic corrections that take into account musical knowledge such as voice ranges. The user is able to delete and replace lines in the spectrogram. There is a web interface for the Erkomaishvili dataset which plays the audio of each song accompanied by a scrolling score with lyrics. The work of [2] created the Dagstuhl dataset by recording each singer with a larynx microphone, a headset microphone, and a dy- namic microphone. The researchers applied both the pYin and CREPE algorithms to each recording and derived con- fdences for each algorithm on each microphone using a subset of recordings manually annotated by a sound engi- neer who used Tony. Our method differs from these methods in that it is de- signed for polyphonic recordings that contain one salient voice, it makes use of several f 0 estimates, and it presents 353