Towards interpretable classifiers with blind signal separation Héctor Ruiz Department of Mathematics and Statistics Liverpool John Moores University Liverpool, United Kingdom H.Ruiz@2010.ljmu.ac.uk Ian H. Jarman Department of Mathematics and Statistics Liverpool John Moores University Liverpool, United Kingdom I.H.Jarman@ljmu.ac.uk José D. Martín Departamento de Ingeniería Electrónica Universidad de Valencia Burjassot, Spain jose.d.martin@uv.es Sandra Ortega-Martorell Departament de Bioquímica i Biología Molecular Universitat Autònoma de Barcelona Cerdanyola del Vallès, Spain Sandra.Ortega@uab.cat Alfredo Vellido Department of Computer Languages and Systems Universitat Politècnica de Catalunya Barcelona, Spain avellido@lsi.upc.edu Enrique Romero Department of Computer Languages and Systems Universitat Politècnica de Catalunya Barcelona, Spain eromero@lsi.upc.edu Paulo J.G. Lisboa Department of Mathematics and Statistics Liverpool John Moores University Liverpool, United Kingdom P.J.Lisboa@ljmu.ac.uk Abstract—Blind signal separation (BSS) is a powerful tool to open-up complex signals into component sources that are often interpretable. However, BSS methods are generally unsupervised, therefore the assignment of class membership from the elements of the mixing matrix may be sub-optimal. This paper proposes a three-stage approach using Fisher information metric to define a natural metric for the data, from which a Euclidean approximation can then be used to drive BSS. Results with synthetic data models of real-world high-dimensional data show that the classification accuracy of the method is good for challenging problems, while retaining interpretability. Blind signal separation; non-negative matrix factorisation; Fisher information; Riemannian metric; data mapping; magnetic resonance spectroscopy; brain tumour I. INTRODUCTION Blind signal separation (BSS) is a well-known family of tools to separate complex signals into linear combinations of sources whose joint distribution is close to factorised into a product of independent univariate density functions for the individual sources. This approach is rendered even more interpretable when it is applied in the convex space of positive semi-definite mixing and unmixing matrices [1]. Both the sources themselves and the partial membership of each source class can then be evaluated against prior knowledge. In our example, synthetic data models are built from single voxel magnetic resonance spectroscopy (MRS) signal corresponding to a neuro-oncology problem. The sources will ideally approximate prototypes for each brain tissue class and the maximal values in each row of the mixing matrix will correspond to the correct binary classification of that observation. In this data set the correct prototype is taken to be the mean of the generating distribution. In a previous work [2], the authors investigated the application of non-negative matrix factorisation (NMF) methods [3,4] for the extraction of tissue type-specific MRS U.S. Government work not protected by U.S. copyright WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia IJCNN