Research Article DetectionofEmotionofSpeechforRAVDESSAudioUsingHybrid Convolution Neural Network Tanvi Puri , 1 Mukesh Soni , 2 Gaurav Dhiman , 3,4,5 Osamah Ibrahim Khalaf , 6 Malik alazzam , 7 and Ihtiram Raza Khan 8 1 ICT Ganpat University, Ahmedabad, Gujarat, India 2 Computer Science and Engineering, Jagran Lakecity University, Bhopal, India 3 Department of Computer Science, Government Bikram College of Commerce, Patiala, India 4 University Centre for Research and Development, Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, India 5 Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India 6 Al-Nahrain University, Baghdad, Iraq 7 Lone Star College-Victory Center, Houston, TX, USA 8 Computer Science Department, Jamia Hamdard University, Delhi, India Correspondence should be addressed to Mukesh Soni; mukesh.research24@gmail.com Received 29 May 2021; Revised 19 January 2022; Accepted 31 January 2022; Published 27 February 2022 Academic Editor: Antonio Gloria Copyright © 2022 Tanvi Puri et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Every human being has emotion for every item related to them. For every customer, their emotion can help the customer representative to understand their requirement. So, speech emotion recognition plays an important role in the interaction between humans. Now, the intelligent system can help to improve the performance for which we design the convolution neural network (CNN) based network that can classify emotions in diﬀerent categories like positive, negative, or more speciﬁc. In this paper, we use the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio records. e Log Mel Spectrogram and Mel-Frequency Cepstral Coeﬃcients (MFCCs) were used to feature the raw audio ﬁle. ese properties were used in the classiﬁcation of emotions using techniques, such as Long Short-Term Memory (LSTM), CNNs, Hidden Markov models (HMMs), and Deep Neural Networks (DNNs). For this paper, we have divided the emotions into three sections for males and females. In the ﬁrst section, we divide the emotion into two classes as positive. In the second section, we divide the emotion into three classes such as positive, negative, and neutral. In the third section, we divide the emotions into 8 diﬀerent classes such as happy, sad, angry, fearful, surprise, disgust expressions, calm, and fearful emotions. For these three sections, we proposed the model which contains the eight consecutive layers of the 2D convolution neural method. e purposed model gives the better-performed categories to other previously given models. Now, we can identify the emotion of the consumer in better ways. 1. Introduction Speech is the direct way to transfer information from one end to another end. It contains a wide variety of information, and it can express rich emotional information through the emotions it contains and visualize it in response to objects, scenes, or events. e automatic recognition of emotions by analyzing the human voice and facial expressions has be- come this subject. e following systems can be cited as an exampleoftheareasinwhichthesestudiesareusedandtheir intended use is provided: (i) Education: A course system for distance education can detect bored users so that they can change the style or level of the material provided and, in ad- dition, provide emotional incentives or compromises. (ii) Automobile: Driving performance and the emo- tional state of the driver are often linked internally. erefore, these systems can be used to promote the driving experience and to improve driving performance. Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 8472947, 9 pages https://doi.org/10.1155/2022/8472947