TRANSCRIBING BACH CHORALES USING NON-NEGATIVE MATRIX FACTORISATION Somnuk Phon-Amnuaisuk Music Informatics Research Group, Faculty of Creative Industries University Tunku Abdul Rahman (UTAR) Petaling Jaya Campus, Selangor Darul Ehsan, Malaysia somnuk@utar.edu.my ABSTRACT This paper discusses our research on polyphonic music tran- scription using non-negative matrix factorization (NMF). The application of NMF in polyphonic transcription has two known limitations (i) the transcription output is a permutation of the input source signals (e.g., the following polyphonic input notes c, e, g and b may produce polyphonic output notes in the following order c, b, g and e) and (ii) the ac- curacy of the transcription depends on the accuracy of the factor r where r is the actual number of active pitches. This work proposes a novel approach by exploiting a tone model to tackle both the permutation of transcription output and the unknown factoring r issues. In our current implementation, the tone model is learned from the training data consisting of the pitches of the desired instrument. This approach offers an effective exploitation of the domain knowledge (i.e., tone model of each pitch). The empirical results show that the proposed tone-model initialised NMF (ICTM-NMF) could significantly improve the transcription output accuracy. Index TermsPolyphonic music transcription, Non- Negative matrix factorisation, Tone model, Transcribing Bach Chorales 1. BACKGROUND Automatic music transcription concerns the translation of mu- sical sounds into written manuscripts in standard music nota- tions. The music transcription domain shares many similari- ties to its kin, speech recognition. Phonemes could be viewed as pitches and different speakers could be viewed as differ- ent instruments. Up to now, it is still not possible to accu- rately transcribe all music parts from a full orchestra or a full popular band. The mixture of sounds from different instru- ments and singing voices pose hard constraints to the existing techniques. To date, the transcription of a single melody line (monophonic) is quite accurate but transcribing polyphonic music is still an open research area. Common employed features in audio analysis are de- rived from time-domain and frequency domain components of the input sound wave. Information obtained from both time-domain and frequency domain analysis still lacks some expressiveness in the discrimination of simultaneous occur- rences of pitches in the polyphonic transcription task. This challenge has been approached from different perspectives, among which is the blackboard architecture which incorpo- rates various knowledge sources in the system [1]. These knowledge sources provide information regarding notes, in- tervals, chords, etc., which could be used in the transcription process. Explicitly encoded knowledge in this style is usu- ally effective but requires a laborious knowledge engineering effort. Soft computing techniques such as Bayesian approach [2], [3], [4], [5], [6]; Hidden Markov Model; Artificial Neural Network and factoring techniques (e.g., ICA, NMF) [7], [8] have emerged as other popular alternatives since knowledge elicitation and maintenance could be performed from the training data. The application of NMF in polyphonic transcription may exploits the harmonic relationships of FT coefficients. The matrix V is derived from FT coefficients of input signal. NMF factorizes an input matrix V to two components W and H where the two components could bear useful semantic of a part-based representation of the original input matrix. In polyphonic music transcription, the source signal V might be seen as the aggregation of the basis vector matrix W and their activation patterns H, V m×n = W m×r H r×n . In this implementation, the input V is represented as a piano roll and the activation patterns H has the interpretation of the source signals, which are the transcribed notes. 2. TRANSCRIBING BACH CHORALES NMF is appealing to polyphonic music transcription problem due to its simplicity. Early reports by [7] discussed the poten- tial of NMF in a polyphonic music transcription task. There are three major issues related to the application of NMF for the polyphonic transcription task [7], [9]: (i) the number of pitches (i.e., factoring r) that needs to be known beforehand; (ii) the factorization process via alternating projected gradi- ent method that may get stuck in local optima [7]; and (iii) the output transcriptions that are the permutation of the input