VOLUME XX, 2017 1 Method for classifying a noisy Raman spectrum based on a wavelet transform and a deep neural network Liangrui Pan 1 , (Student Member, IEEE), Pronthep Pipitsunthonsan 1 , (Student Member, IEEE), Chalongrat Daengngam 2 , Sittiporn Channumsin 3 , Suwat Sreesawet 3 , Mitchai Chongcheawchamnan 1 , (Senior Member, IEEE) 1 Faculty of Engineering, Prince of Songka University, Songkhla, 90110 Thailand 2 Faculty of Science, Prince of Songka University, Songkhla, 90110 Thailand 3 Geo-Informatics and Space Technology Development Agency (GISTDA), Chonburi 20230, Thailand Corresponding author: Mitchai Chongcheawchamnan (mitchai.c@psu.ac.th). This work is funded by Science, Reserach and Innovation Promotion Fund (Grant No: 1383848). ABSTRACT This paper proposes a new framework based on a wavelet transform and deep neural network for identifying noisy Raman spectrum since, in practice, it is relatively difficult to classify the spectrum under baseline noise and additive white Gaussian noise environments. The framework consists of two main engines. Wavelet transform is proposed as the framework front-end for transforming 1-D noise Raman spectrum to two-dimensional data. This two-dimensional data will be fed to the framework back-end which is a classifier. The optimum classifier is chosen by implementing several traditional machine learning (ML) and deep learning (DL) algorithms, and then we investigated their classification accuracy and robustness performances. The four MLs we choose included a Naive Bayes (NB), a Support Vector Machine (SVM), a Random Forest (RF) and a K-Nearest Neighbor (KNN) where a deep convolution neural network (DCNN) was chosen for a DL classifier. Noise-free, Gaussian noise, baseline noise, and mixed-noise Raman spectrums were applied to train and validate the ML and DCNN models. The optimum back-end classifier was obtained by testing the ML and DCNN models with several noisy Raman spectrums (10 – 30 dB noise power). Based on the simulation, the accuracy of the DCNN classifier is 9% higher than the NB classifier, 3.5% higher than the RF classifier, 1% higher than the KNN classifier, and 0.5% higher than the SVM classifier. In terms of robustness to the mixed noise scenarios, the framework with DCNN back-end showed superior performance than the other ML back-ends. The DCNN back-end achieved 90% accuracy at 3 dB SNR while NB, SVM, RF, and K-NN back-ends required 27 dB, 22 dB, 27 dB, and 23 dB SNR, respectively. In addition, in the low-noise test data set, the F-measure score of the DCNN back-end exceeded 99.1% while the F-measure scores of the other ML engines were below 98.7%. INDEX TERMS Raman spectrum, baseline noise, wavelet transform, deep convolution neural network, accuracy, robustness. I. INTRODUCTION Raman spectroscopy is a material characterisation method widely used in industrial process controls, planetary exploration, homeland security, life science, geological field investigation, and laboratory material research [1]. By identifying the Raman spectrum of a small number of substances, an accurate label of the substance can be obtained [2]. For example, in the detection of minerals in the field, we may only sample all the minerals and then perform experimental analysis on them. It is necessary to perform pre-processing for obtaining Raman spectra, such as using Raman spectroscopy to check the composition of chemical substances and implement statistical classification methods. Preferably, a rapid and accurate classification algorithm is required when dealing with a large Raman spectrum set. Nowadays, there are many chemical/biochemical molecular structure databases for researchers to access, such as the FT- Raman spectra database [3], an e-VISART database [4], a biomolecule database [5], and an explosive compound database [6]. These databases contain a large amount of raw and processed Raman data for Raman spectroscopy application.