TEACH YOURSELF GEORGIAN FOLK SONGS DATASET: AN ANNOTATED
CORPUS OF TRADITIONAL VOCAL POLYPHONY
David Gillman
New College of Florida
dgillman@ncf.edu
Uday Goyat
Georgia Institute of Technology
ugoyat3@gatech.edu
Atalay Kutlay
New College of Florida
atalay.kutlay18@ncf.edu
ABSTRACT
New datasets of non-Western traditional music contribute
to the development of knowledge in MIR and allow com-
putational techniques to inform ethnomusicology. We
present an annotated dataset of traditional vocal polyphony
from two regions of the Republic of Georgia with disparate
musical characteristics. The audio for each song consists
of four polyphonic recordings of one performance from
different microphones. We present a process and workfow
that we use to annotate the dataset, which takes advantage
of the salience of individual voices in each recording. The
process results in an f
0
estimate for each vocal part.
1. INTRODUCTION
To evaluate algorithms in Music Information Retrieval
(MIR) it is essential to have a variety of extensive datasets
of annotated music [1]. Annotated datasets of vocal a cap-
pella music have been scarce until recently, particularly
of non-Western music. The work of [2] provides a list
of datasets of vocal polyphony that includes two datasets
of songs from the Republic of Georgia: the Erkomaishvili
dataset of [3] and the collection recorded in the work of [4].
Multi-f
0
estimation is a sub-problem of Automated
Music Transcription (AMT) consisting of identifying the
fundamental frequency f
0
of each part in a polyphonic
recording. Datasets of polyphony that are annotated with
the fundamental frequency of each part are useful as
ground truth for multi-f
0
algorithms. Multi-f
0
estima-
tion is a challenging problem for a cappella vocal music
because of the variety of sounds produced by the human
voice and the similarity in timbre of different voices [5].
In this work we present an annotated dataset of 38
three-part songs from the Republic of Georgia, including
29 from the region of Guria and nine from the region of
Samegrelo. The total duration of the collection is 89 min-
utes. Gurian songs are a particular challenge for multi-f
0
estimation. The three parts are independent melodic lines
that often cross and contain rapid movement. The top part
often consists of krimanchuli, Georgian yodeling, with as
© D. Gillman, U. Goyat, and A. Kutlay. Licensed under a
Creative Commons Attribution 4.0 International License (CC BY 4.0).
Attribution: D. Gillman, U. Goyat, and A. Kutlay, ªTeach Yourself
Georgian Folk Songs Dataset: An Annotated Corpus Of Traditional Vocal
Polyphonyº, in Proc. of the 23rd Int. Society for Music Information
Retrieval Conf., Bengaluru, India, 2022.
many as six changes of octave per second. These features
make our dataset a useful contribution as part of a training
set for multi-f
0
algorithms. We have created a web-based
visualization of the dataset that is potentially useful as an
aid to singers learning their parts on Georgian songs, which
was the purpose of the original recordings.
We also present a new process and workfow for multi-
f
0
estimation where several recordings exist of a single
performance. Our dataset is an unusual challenge for an-
notation in that it does not include isolated tracks for each
vocal part. Instead there are four recordings of one perfor-
mance made from different microphones. One recording
presents a balanced mix of voices. In each of the other
three recordings one of the voices is more salient than the
other two, but all three voices are easily audible and create
a polyphonic mixture. Our process and workfow consist
of isolating the salient voice, applying several algorithms
for monophonic f
0
estimation, and using a graphical in-
terface to select the correct estimate. In addition to f
0
es-
timates we present the median absolute deviation of the
estimates, a measure of confdence in the estimates.
Other interactive methods have appeared for extract-
ing melody from audio. The work of [6] introduced the
Tony software for monophonic audio. It presents to the
user several pitch estimates generated by an early stage of
the pYin algorithm, from which the user can select ranges
of time and frequency. It is designed for ease of use and
has many features such as octave correction [7]. The sys-
tem of [8] designed for the Erkomaishvili dataset presents
to the user a spectrogram with one melody highlighted as
a result of dynamic programming performed on a set of
salient frequencies at each time step, followed by auto-
matic corrections that take into account musical knowledge
such as voice ranges. The user is able to delete and replace
lines in the spectrogram. There is a web interface for the
Erkomaishvili dataset which plays the audio of each song
accompanied by a scrolling score with lyrics. The work
of [2] created the Dagstuhl dataset by recording each singer
with a larynx microphone, a headset microphone, and a dy-
namic microphone. The researchers applied both the pYin
and CREPE algorithms to each recording and derived con-
fdences for each algorithm on each microphone using a
subset of recordings manually annotated by a sound engi-
neer who used Tony.
Our method differs from these methods in that it is de-
signed for polyphonic recordings that contain one salient
voice, it makes use of several f
0
estimates, and it presents
353