Submit Manuscript | http://medcraveonline.com
Introduction
Technological advances have made people’s lives easier by
making ordinary chores such as booking a flight, managing the home
temperature, or obtaining instant information simultaneously with
other tasks. This assistant saves time by using the oral communication
with the computer such as answer queries and manage some operating
system commands so as to communicate with peripheral devices.
Practically, the user can check the weather, manage IoT devices,
and handle various system activities with their voice remotely and
with less effort. Software interfaces, for instance, Conversational
Interfaces (CIs) have evolved with the goal of simplifying human-
machine interactions by allowing humans to engage with computers
using human words.
1
They are especially important when individuals
are preoccupied with other tasks, such as driving. CIs, also, attempts
to support businesses in many ways in aiding customers.
2
Chatbots
and voicebots are the two most common forms of CIs. Chatbots
utilize natural language to mimic human interaction with a user via
text, which is generally done via websites or mobile apps. Voicebots,
on the other hand, understand natural language commands using
speech recognition technology.
3
The author in
4
introduced multi-
layer feed forward NN for spectrum sensing to identify the main
users. Through their study, they figured out that the design structure
of the neural network controls the accuracy of the detection,
where they trained multi-layer feed forward NN with different
back propagation algorithms. An overview of the mel-frequency
cepstral coefficients, vector quantization and their relationship are
presented in
5
where vector quantization works as a classifier of the
speech signals. It can be combined with the mel-frequency cepstral
coefficients and work as speaker recognition. Companies, consumers,
and several scientific organizations have all prioritized the usage of
conversational interfaces.
6
As businesses increasingly employ these
conversational agents to communicate with customers, it’s critical to
understand the variables that influence people to use chatbots. This
necessitates a higher sense of urgency, especially in light of recent
research demonstrating the disadvantages and high failure rates of
chatbots used on social media and messaging applications.
7
Since
the debut of its bot API on Messenger, Facebook has revealed that
their Artificial Intelligence (AI) bots have had a 70% failure rate.
For instance, they do not accurately answer particular queries.
8
The
objective of this paper is to create an app to ease the daily life of the
users. The following is a description of how the paper is structured:
System structure and requirements in section II, Chatbots systems are
presented in section III, The integration procedure presented in section
IV, The implementation methodology provided in V, The MFCC
combined with neural networks presented in VI and the conclusion is
provided in section VII.
System structure and requirements
The first step that needs to take place is to record the spoken words
and convert it to text, followed by ascertaining that the is able to extract
the intent using its artificial intelligence algorithms. The next logical
step is to confirm that the Virtual Assistant is able to respond based on
the intent deduced in the prior step. Some responses need to execute
system commands, others need to get information from third party
Application Programming Interface (API) (like weather, and other
applications) or changing some values on the Internet of things (IoT)
devices. The Software will react based on the confidence that it has in
the intent. If it is not too confident about it, it will ask the user to repeat
the spoken words in a different form. Lastly, the Software will take
action and playback a voice to indicate what action it takes or reply
with an answer if needed. This visualizes in the system architecture
below in Figure 1. The system requirements necessary for the design
and implementation process are:
a. Computer with a Linux operating system installed on it.
b. (Wit, Wolframalpha, and Snowboy) accounts.
Int Rob Auto J. 2022;8(1):27‒32. 27
©2022 Abougarair et al. This is an open access article distributed under the terms of the Creative Commons Attribution License,
which permits unrestrited use, distribution, and build upon your work non-commercially.
Design and implementation of smart voice assistant
and recognizing academic words
Volume 8 Issue 1 - 2022
Ahmed J Abougarair,
1
Mohamed KI
Aburakhis,
2
Mohamed O Zaroug
1
1
Department of Electrical and Electronic Engineering, University
of Tripoli, Libya
2
Department of Engineering Technology, Clark State College,
USA
Correspondence: Ahmed J Abougarair, Electrical and
Electronic Engineering, University of Tripoli, Tripoli, Libya, Tel
+218925385942, Email
Received: October 18, 2021 | Published: February 24, 2022
Abstract
This paper approaches the use of a Virtual Assistant using neural networks for recognition
of commonly used words. The main purpose is to facilitate the users’ daily lives by sensing
the voice and interpreting it into action. Alice, which is the name of the assistant, is
implemented based on four main techniques: Hot word detection, Voice to Text conversion,
Intent recognition, and Text to Voice conversion. Linux is the operating system of choice,
for developing and running the assistant because it is in the public domain, also, Linux has
been implemented on most Single-board computers. Python is chosen as a development
language due to its capabilities and compatibility with various APIs and libraries, which
are deemed necessary for the project. The virtual assistant will be required to communicate
with IoT devices. In addition, a speech recognition system is created in order to recognize
the significant technical words. An artificial neural network (ANN) with different structure
networks and training algorithms is utilized in conjunction with the Mel Frequency
Cepstral Coefficient (MFCC) feature extraction technique to increase the identification
rate effectively and find the optimal performance. For training purposes, the Levenberg-
Marquardt (LM) and BGFS Quasi-Newton Resilient Backpropagation are compared using
10 MFCC, utilizing from 10 to 50 neurons increasing in increments of 10 similarly for
13MFCC the training is done utilizing from between 10 to 50 neurons.
Keywords: chatbots, IoT devices, MFCC features, neural networks, voice assistant
International Robotics & Automation Journal
Research Article
Open Access