Research Article
DetectionofEmotionofSpeechforRAVDESSAudioUsingHybrid
Convolution Neural Network
Tanvi Puri ,
1
Mukesh Soni ,
2
Gaurav Dhiman ,
3,4,5
Osamah Ibrahim Khalaf ,
6
Malik alazzam ,
7
and Ihtiram Raza Khan
8
1
ICT Ganpat University, Ahmedabad, Gujarat, India
2
Computer Science and Engineering, Jagran Lakecity University, Bhopal, India
3
Department of Computer Science, Government Bikram College of Commerce, Patiala, India
4
University Centre for Research and Development, Department of Computer Science and Engineering, Chandigarh University,
Gharuan, Mohali, India
5
Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India
6
Al-Nahrain University, Baghdad, Iraq
7
Lone Star College-Victory Center, Houston, TX, USA
8
Computer Science Department, Jamia Hamdard University, Delhi, India
Correspondence should be addressed to Mukesh Soni; mukesh.research24@gmail.com
Received 29 May 2021; Revised 19 January 2022; Accepted 31 January 2022; Published 27 February 2022
Academic Editor: Antonio Gloria
Copyright © 2022 Tanvi Puri et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Every human being has emotion for every item related to them. For every customer, their emotion can help the customer
representative to understand their requirement. So, speech emotion recognition plays an important role in the interaction between
humans. Now, the intelligent system can help to improve the performance for which we design the convolution neural network
(CNN) based network that can classify emotions in different categories like positive, negative, or more specific. In this paper, we
use the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio records. e Log Mel Spectrogram and
Mel-Frequency Cepstral Coefficients (MFCCs) were used to feature the raw audio file. ese properties were used in the
classification of emotions using techniques, such as Long Short-Term Memory (LSTM), CNNs, Hidden Markov models (HMMs),
and Deep Neural Networks (DNNs). For this paper, we have divided the emotions into three sections for males and females. In the
first section, we divide the emotion into two classes as positive. In the second section, we divide the emotion into three classes such
as positive, negative, and neutral. In the third section, we divide the emotions into 8 different classes such as happy, sad, angry,
fearful, surprise, disgust expressions, calm, and fearful emotions. For these three sections, we proposed the model which contains
the eight consecutive layers of the 2D convolution neural method. e purposed model gives the better-performed categories to
other previously given models. Now, we can identify the emotion of the consumer in better ways.
1. Introduction
Speech is the direct way to transfer information from one
end to another end. It contains a wide variety of information,
and it can express rich emotional information through the
emotions it contains and visualize it in response to objects,
scenes, or events. e automatic recognition of emotions by
analyzing the human voice and facial expressions has be-
come this subject. e following systems can be cited as an
exampleoftheareasinwhichthesestudiesareusedandtheir
intended use is provided:
(i) Education: A course system for distance education
can detect bored users so that they can change the
style or level of the material provided and, in ad-
dition, provide emotional incentives or
compromises.
(ii) Automobile: Driving performance and the emo-
tional state of the driver are often linked internally.
erefore, these systems can be used to promote the
driving experience and to improve driving
performance.
Hindawi
Journal of Healthcare Engineering
Volume 2022, Article ID 8472947, 9 pages
https://doi.org/10.1155/2022/8472947