Submit Manuscript | http://medcraveonline.com Introduction Technological advances have made people’s lives easier by making ordinary chores such as booking a ﬂight, managing the home temperature, or obtaining instant information simultaneously with other tasks. This assistant saves time by using the oral communication with the computer such as answer queries and manage some operating system commands so as to communicate with peripheral devices. Practically, the user can check the weather, manage IoT devices, and handle various system activities with their voice remotely and with less eﬀort. Software interfaces, for instance, Conversational Interfaces (CIs) have evolved with the goal of simplifying human- machine interactions by allowing humans to engage with computers using human words. 1 They are especially important when individuals are preoccupied with other tasks, such as driving. CIs, also, attempts to support businesses in many ways in aiding customers. 2 Chatbots and voicebots are the two most common forms of CIs. Chatbots utilize natural language to mimic human interaction with a user via text, which is generally done via websites or mobile apps. Voicebots, on the other hand, understand natural language commands using speech recognition technology. 3 The author in 4 introduced multi- layer feed forward NN for spectrum sensing to identify the main users. Through their study, they ﬁgured out that the design structure of the neural network controls the accuracy of the detection, where they trained multi-layer feed forward NN with diﬀerent back propagation algorithms. An overview of the mel-frequency cepstral coeﬃcients, vector quantization and their relationship are presented in 5 where vector quantization works as a classiﬁer of the speech signals. It can be combined with the mel-frequency cepstral coeﬃcients and work as speaker recognition. Companies, consumers, and several scientiﬁc organizations have all prioritized the usage of conversational interfaces. 6 As businesses increasingly employ these conversational agents to communicate with customers, it’s critical to understand the variables that inﬂuence people to use chatbots. This necessitates a higher sense of urgency, especially in light of recent research demonstrating the disadvantages and high failure rates of chatbots used on social media and messaging applications. 7 Since the debut of its bot API on Messenger, Facebook has revealed that their Artiﬁcial Intelligence (AI) bots have had a 70% failure rate. For instance, they do not accurately answer particular queries. 8 The objective of this paper is to create an app to ease the daily life of the users. The following is a description of how the paper is structured: System structure and requirements in section II, Chatbots systems are presented in section III, The integration procedure presented in section IV, The implementation methodology provided in V, The MFCC combined with neural networks presented in VI and the conclusion is provided in section VII. System structure and requirements The ﬁrst step that needs to take place is to record the spoken words and convert it to text, followed by ascertaining that the is able to extract the intent using its artiﬁcial intelligence algorithms. The next logical step is to conﬁrm that the Virtual Assistant is able to respond based on the intent deduced in the prior step. Some responses need to execute system commands, others need to get information from third party Application Programming Interface (API) (like weather, and other applications) or changing some values on the Internet of things (IoT) devices. The Software will react based on the conﬁdence that it has in the intent. If it is not too conﬁdent about it, it will ask the user to repeat the spoken words in a diﬀerent form. Lastly, the Software will take action and playback a voice to indicate what action it takes or reply with an answer if needed. This visualizes in the system architecture below in Figure 1. The system requirements necessary for the design and implementation process are: a. Computer with a Linux operating system installed on it. b. (Wit, Wolframalpha, and Snowboy) accounts. Int Rob Auto J. 2022;8(1):27‒32. 27 ©2022 Abougarair et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestrited use, distribution, and build upon your work non-commercially. Design and implementation of smart voice assistant and recognizing academic words Volume 8 Issue 1 - 2022 Ahmed J Abougarair, 1 Mohamed KI Aburakhis, 2 Mohamed O Zaroug 1 1 Department of Electrical and Electronic Engineering, University of Tripoli, Libya 2 Department of Engineering Technology, Clark State College, USA Correspondence: Ahmed J Abougarair, Electrical and Electronic Engineering, University of Tripoli, Tripoli, Libya, Tel +218925385942, Email Received: October 18, 2021 | Published: February 24, 2022 Abstract This paper approaches the use of a Virtual Assistant using neural networks for recognition of commonly used words. The main purpose is to facilitate the users’ daily lives by sensing the voice and interpreting it into action. Alice, which is the name of the assistant, is implemented based on four main techniques: Hot word detection, Voice to Text conversion, Intent recognition, and Text to Voice conversion. Linux is the operating system of choice, for developing and running the assistant because it is in the public domain, also, Linux has been implemented on most Single-board computers. Python is chosen as a development language due to its capabilities and compatibility with various APIs and libraries, which are deemed necessary for the project. The virtual assistant will be required to communicate with IoT devices. In addition, a speech recognition system is created in order to recognize the signiﬁcant technical words. An artiﬁcial neural network (ANN) with diﬀerent structure networks and training algorithms is utilized in conjunction with the Mel Frequency Cepstral Coeﬃcient (MFCC) feature extraction technique to increase the identiﬁcation rate eﬀectively and ﬁnd the optimal performance. For training purposes, the Levenberg- Marquardt (LM) and BGFS Quasi-Newton Resilient Backpropagation are compared using 10 MFCC, utilizing from 10 to 50 neurons increasing in increments of 10 similarly for 13MFCC the training is done utilizing from between 10 to 50 neurons. Keywords: chatbots, IoT devices, MFCC features, neural networks, voice assistant International Robotics & Automation Journal Research Article Open Access