Emotion Recognition Using Neural Networks MEHMET S. UNLUTURK, KAYA OGUZ, COSKUN ATAY Department of Software Engineering Izmir University of Economics Sakarya Cad No.156, Balcova, Izmir 35330 TURKEY suleyman.unluturk@ieu.edu.tr kaya.oguz@ieu.edu.tr coskun.atay@ieu.edu.tr Abstract: - Speech and emotion recognition improve the quality of human computer interaction and allow more easy to use interfaces for every level of user in software applications. In this study, we have developed the emotion recognition neural network (ERNN) to classify the voice signals for emotion recognition. The ERNN has 128 input nodes, 20 hidden neurons, and three summing output nodes. A set of 97932 training sets is used to train the ERNN. A new set of 24483 testing sets is utilized to test the ERNN performance. The samples tested for voice recognition are acquired from the movies “Anger Management” and “Pick of Destiny”. ERNN achieves an average recognition performance of 100%. This high level of recognition suggests that the ERNN is a promising method for emotion recognition in computer applications. Key-Words: - Back propagation learning algorithm, Neural network, Emotion, Speech, Power Spectrum, Fast-Fourier Transform (FFT) 1 Introduction Speech is one of the oldest tools humans use for interaction among each other. It is therefore one of the most natural ways to interact with the computers as well. Although speech recognition is now good enough to allow speech to text engines, emotion recognition can increase the over all efficiency of interaction and may provide everyone a more comfortable user interface. It is often trivial for humans to get the emotion of the speaker and adjust their behavior accordingly. Emotion recognition will give the programmer a chance to develop an artificial intelligence that can meet the speaker's feelings that can be used in many scenarios from computer games to virtual sales-programs. Three base emotions, angry, happy and neutral are taken into account. Various speech sets that belong to these emotion groups are extracted from different movies and used for training and testing. The ERNN is capable of distinguishing these test samples. Neural networks are chosen for the solution because a basic formula cannot be devised for the problem. The neural networks are also quick to respond which is a requirement as the emotion should be determined almost instantly. The training takes a long time but is irrelevant as the training is mostly done off-line. The paper is organized as follows; part 2 is about ERNN design, part 3 tells about the results and discussion, part 4 includes the conclusion and future work. 2 ERNN Design Emotion recognition is not a new topic and both research and applications exist using various methods most of which require extracting certain features from the speech [1]. A common problem is determining emotion from noisy speech, which complicates the extraction of the emotion because of the background noise [2]. To extract the emotion signatures inherent to voice signals, the back propagation-learning algorithm [3,4,5,6] is used to design the emotion recognition neural network (ERNN). The block diagram of ERNN is shown in Figure 1. Segmented data is applied to the input of the power spectrum processor utilizing the Fast Fourier Transform algorithm. The output of it is normalized and is presented to a three layer, fully interconnected neural network for classification. The output layer of the neural network is inputted by the weighted sum of outputs of the hidden and bias nodes in the hidden layer. These weighted inputs are processed by a hyperbolic tangent function. A set of desired output values is then compared to the estimated outputs of the neural network for every set of input values of the power spectrum of the voice signals. The weights are appropriately updated by back propagating the gradient of the output error through the entire neural network. Proceedings of the 10th WSEAS International Conference on NEURAL NETWORKS ISSN: 1790-5109 82 ISBN: 978-960-474-065-9