Ha-Hi-Hun plays Chopin’s Etude Keiji Hirata∗, Rumi Hiraga† ∗ NTT Communication Science Research Laboratory, †Bunkyo University email: hirata@brl.ntt.co.jp Abstract A new framework called two-stage perfor- mance rendering was proposed in order to make it realize incremental, interactive, and lo- cal rendering through direct instructions issued by a user. The ﬁrst stage translates a user’s in- struction into the deviations of the onset time, duration, and amplitude of structurally impor- tant notes. The second stage spreads the devi- ations over surrounding notes. Ha-Hi-Hun is a prototype performance rendering system hav- ing the framework. 1 Introduction To achieve a high level of controllability, we assert that a performance rendering (PR) system should be able to (a) properly interpret the user’s instructions and (b) synthesize a natural performance that reﬂects these in- structions. For (a), if instructions are given in a natu- ral language, they are usually subjective, equivocal, and even time-varying. The system should be able to be cus- tomized or personalized and be context-sensitive. To solve this problem, there can be a method that a user gives some sample performances instead of a natural lan- guage. For (b), let us suppose a case in which a user gives an instruction to play note Q louder in a particu- lar part of a piece. If the system naively increases only the amplitude of Q, the generated performance may be- come unnatural. Considering the role of Q in the piece, the surrounding notes should also be played either louder or softer and even their agogics may have to be adjusted. Thus, to keep a generated performance natural, a PR sys- tem must maintain a certain musical consistency, which is represented in the form of the constraints regarding the agogics and dynamics for Q and the surrounding notes. To meet these two requirements, this paper proposes a new framework called two-stage performance render- ing [1]. 2 Two-Stage Performance Rendering Let us assume that tutor’s instructions for expression are issued to salient notes. That is, when a tutor says “play this note carefully”, this note means a salient note within a certain time range. The two-stage PR frame- work is motivated by this assumption. 2.1 Architecture In Fig. 1, the ﬁrst stage translates a user’s instruc- tion into the agogics and dynamics of structurally im- portant notes in a range and the second stage adjusts the surrounding notes. Here, a structurally important note ✛ ✚ ✘ ✙ Expressive Performance ✛ ✚ ✘ ✙ Deviations of Salient Notes ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣ ♣♣♣♣♣♣♣♣♣♣ ❄ ❄ ✂ ✂ ✂ ✂ ✂ ✂ ✌ ⇓ First Stage ⇓ ⇓ Second Stage ⇓ Time-Span Reduction of GTTM ✛ ✚ ✘ ✙ Instructions 〈operation, range〉 ✛ ✚ ✘ ✙ Score ✛ ✚ ✘ ✙ Real Sample Performances or Musicology ✛ ✚ ✘ ✙ Performance Knowledge Prepared Beforehand ✛ ❄ Musical Constraint Satisfaction Figure 1: Two-Stage Performance Rendering means a salient note in the context of the time-span re- duction of GTTM. The inputs for two-stage PR are a score to be per- formed, instructions given by the user, and sample per- formances for extracting performance knowledge, which may be substituted with built-in rules or mathematical