SEQUENTIAL INFERENCE OF RHYTHMIC STRUCTURE IN MUSICAL AUDIO
Nick Whiteley A. Taylan Cemgil Simon Godsill
University of Cambridge
Department of Engineering
Signal Processing and Communications Laboratory
ABSTRACT
This paper presents a framework for the modelling of tem-
poral characteristics of musical signals and an approximate,
sequential Monte Carlo inference scheme which yields esti-
mates of tempo and rhythmic pattern from onset-time data.
These two features are quantiソed through the construction
of a probabilistic dynamical model of a hidden ‘bar-pointer’
and a Poisson observation model. The capabilities of the sys-
tem are demonstrated by tracking the tempo of a 2 against 3
polyrhythm and detecting a switch in rhythm in a MIDI per-
formance.
Index Terms— Music, Statistics, Poisson distributions,
Monte Carlo methods
1. INTRODUCTION
An important feature of intelligent music systems is the abil-
ity to infer attributes related to temporal structure. These at-
tributes may include musicological constructs such as tempo
and rhythmic pattern. The recognition of these characteristics
forms a sub-task of automatic music transcription - the un-
supervised generation of a score, or description of an audio
signal in terms of musical concepts. For music categorization
systems, tempo and rhythmic pattern are deソning features of
genre and therefore useful features for indexing of data sets.
Much work has been done on detecting the ‘pulse’ or foot-
tapping rate of musical audio signals [1],[2]. However these
approaches do not distinguish between tempo and rhythm.
Goto and Muraoka detail a system which recognizes beats in
terms of the ‘reliability’ of hypotheses for different rhythmic
patterns [3]. Cemgil and Kappen model MIDI onset events
in terms of a tempo process and switches between quantized
score locations [4]. Raphael independently proposed a similar
system [5]. Hainsworth and Macleod infer beats in a similar
framework from raw audio signals [6], but rhythmic pattern
is still not explicitly modelled.
Takeda et al. perform tempo and rhythm recognition from
MIDI data by analogy with speech-recognition, but do not
accommodate polyrhythms [7]. Klapuri et al. deソne metrical
structure in terms of pulse sensations on different time scales,
but do not explicitly discriminate between different rhythmic
patterns [8].
In [9], a novel model of temporal structure in musical sig-
nals was introduced where exact inference was feasible. How-
ever, for certain extensions of the model, the exact inference
scheme suffered from high computational requirements since
it involved storage and manipulation of very large vectors.
In this paper we focus on the development of a practi-
cally scalable, sequential Monte Carlo inference scheme for
a model of tempo and rhythmic pattern analogous to that in
[9]. Development of such an inference scheme is challeng-
ing in this case due to the multi-modality of posterior prob-
ability distributions. In practical terms, this issue arises for
the same reasons that human listeners can often ‘explain’ the
same piece of music in terms of several different combina-
tions of tempo and rhythmic pattern. Whilst the examples in
this paper take as input MIDI onset data, the same framework
could be used with onset times obtained from existing onset
detection systems, e.g. [10].
In the Bayesian paradigm the task of joint estimation of
tempo and rhythmic pattern is treated as an inference prob-
lem, where given a sequence of observations
y
1:n
≡ (y
1
,y
2
, ..., y
n
) the aim is to compute posterior den-
sities over the hidden state variables x
0:n
≡ (x
0
, x
1
, ..., x
n
).
In a sequential setting we ソrst postulate a Markovian prior
density over the hidden state variables, p(x
k+1
|x
k
), which
describes how the state variables evolve from one time index
to the next. The observations are then related to the hidden
state via p(y
k
|x
k
). Up to a constant of proportionality, the
joint posterior density is given by:
p(x
0:n
|y
1:n
) ∝ p(x
0
)
n
k=1
p(y
k
|x
k
)p(x
k
|x
k-1
) (1)
2. BAR-POINTER MODEL
The system is built around a dynamical model of a ‘bar-pointer’,
a hypothetical, hidden object which maps an observed time-
series to one period of a latent rhythmical pattern, i.e. one bar.
At time t
k
= kΔ, k ∈{1, 2, ..., n} and Δ a constant, denote
by φ
k
∈ [0, 1) the position of the bar-pointer and denote by
˙
φ
k
∈ [
˙
φ
min
,
˙
φ
max
] its velocity, where
˙
φ
min
> 0. The proba-
bilistic kinematics of the bar-pointer are modelled as being a
piece-wise constant velocity process:
IV 1321 1424407281/07/$20.00 ©2007 IEEE ICASSP 2007