International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-10, August 2019
2325
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number J87650881019/2019©BEIESP
DOI: 10.35940/ijitee.J8765.0881019
A Robust Isolated Automatic Speech Recognition
System using Machine Learning Techniques
Sunanda Mendiratta, Neelam Turk, Dipali Bansal
Abstract: In order to make fast communication between
human and machine, speech recognition system are used.
Number of speech recognition systems have been developed by
various researchers. For example speech recognition, speaker
verification and speaker recognition. The basic stages of speech
recognition system are pre-processing, feature extraction and
feature selection and classification. Numerous works have been
done for improvement of all these stages to get accurate and
better results. In this paper the main focus is given to addition of
machine learning in speech recognition system. This paper
covers architecture of ASR that helps in getting idea about basic
stages of speech recognition system. Then focus is given to the
use of machine learning in ASR. The work done by various
researchers using Support vector machine and artificial neural
network is also covered in a section of the paper. Along with this
review is presented on work done using SVM, ELM, ANN, Naive
Bayes and kNN classifier. The simulation results show that the
best accuracy is achieved using ELM classifier. The last section
of paper covers the results obtained by using proposed
approaches in which SVM, ANN with Cuckoo search algorithm
and ANN with back propagation classifier is used. The focus is
also on the improvement of pre-processing and feature extraction
processes.
Keywords: Speech recognition system, SVM, kNN, ANN,
Cuckoo search optimization, ELM
I. INTRODUCTION
Ability to communicate is one of the most fundamental
aspects of human behaviour. Through natural languages
human communicate with each other verbally and in written
form. Human communication written format is represented
by vocalized form of human communication i.e., speech [1].
A high quality human computer interactive system has been
developed by advancement in language and speech
technologies. It has broad applications in education,
entertainment and business and to make man-machine
communication more user friendly human-computer
interfaces are designed in which natural languages are used
for interaction between users and machines [2]. As in case
of human-human communication a loop of interaction is
defined by flow of information between computer and
human.
The vocalized form of natural language speech or text
make possible to communicate and vocalized form of
human speech or communication is a most convenient way
for human communication. It will lead to speech recognition
Revised Manuscript Received on August 09, 2019.
* Correspondence Author
Sunanda Mendiratta
*
, Department of Electronics Engineering, J. C.
Bose UST, Faridabad, India. E-mail: sunanda.mendiratta@gmail.com
Neelam Turk, Department of Electronics Engineering, J. C. Bose UST,
Faridabad, India.
Dipali Bansal, ECE Department, FET, Manav Rachna International
Institute of Research and Studies, Faridabad, India.
system development and the machine understands the
meaning of human speech. This is a difficult problem and
relatively active area of research. The translation of spoken
works into respective written scripts is done by speech
recognition and language of speech is identified using
Automated speech recognition (ASR) system and then in a
respective natural language the segments of input speech is
converted into respective units of text. By this an interaction
between human and computer has become easier and
systems have become user friendly [3]. And long term goal
of HCI is minimizing the barrier between humans mental
model. This model is on what they want to accomplish and
computers support of the user’s task. Preparation of
structured documents, aircraft, data entry, speech to text
processing and voice dialling like voice user interfaces are
possible speech recognition applications in HCI. Helping
persons to develop fluency with their speaking skills and
listening to the proper pronunciation are used for learning
different languages in ASR technology [4]. By use of speech
to text programs physically disabled students can who suffer
from strain injury to upper extremities be relieved to worry
about handwriting. Without physically operating a keyboard
or mouse, a computer can be use at home to search on
internet by utilizing the speech recognition technology.
Without the concern of spelling and other writing mechanics
a students with learning disabilities can write better by the
concept of speech recognition [2].
To facilitate the communication between machines and
humans ASR can be used and in various applications a man-
machine interaction and speed based applications are
demonstrated. Communication interfaces for people
with special abilities, translation devices, hands-free
machine operations, dictation systems and voice-mail
systems in telephony are its applications. On other
hand noise free environment, vocabulary and
language, low talking rates and speaker dependency
are some of its limitations. So, to improve the results
work has been done in this field by various
researchers [5].
In the context of isolate word recognition (IWR) basic
idea behind ASR can be explored. Independent of
environment, speaker and device a conversion of speech
signal into its equivalent text message is the goal of ASR
[6]. It is a problem of pattern recognition in which
features are extracted and a model is used for training
and testing.
This paper is divided into various sections in which
second section gives brief introduction of ASR architecture.
The third section contains the brief details about machine
learning and its use in ASR. This section also contains the
review on use of SVM and ANN for speech recognition
system.