International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 10 | Oct 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 947
Speech Recognition: General Idea and Overview
Kewal Mehta
1
, Amitesh Dubey
2
, Rahul Kalsariya
3
, Viraj Prajapati
4
, Prof. Suvarna Pansambal
5
1,2,3,4
BE, Department of Computer Engineering, Atharva College of Engineering, Mumbai, India
5
Head of Department, Department of Computer Engineering, Atharva College of Engineering,
Mumbai, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Voice recognition system is a software that lets the
user control computer functions and helps dictate various
texts. This paper presents a general idea and an overview of
Speech Recognition. We discuss about the general working of
speech recognition and some popular algorithms that are used
in modern day speech recognition devices. Speech is something
that is used everyday and is the most common means of
communication between humans, but nowadays humans are
not restricted to communicating with just humans,
communication of humans with machines is also possible
nowadays due to the advancements in the fields of Artificial
Intelligence, Machine Learning and Deep learning. This
interaction between humans and computers is done using
different interfaces, this is termed as human computer
interaction(HCI). This paper primarily focuses on basic
working and most recognized algorithms of speech
recognition which is one of the most important domain in the
field of artificial intelligence. The paper also gives detailed
knowledge about the various steps of a basic speech
recognition system such as pre-processing, feature extraction
and reorganization. The paper also gives detailed explanation
of various algorithms that are very popular in the field of
artificial intelligence such as PLP(Perceptual linear
programming), NLP(Natural language processing),
DTW(Dynamic time wrap), HMM(Hidden Markov model), N-
grams and shows ways to implement these algorithms in the
speech recognition devices.
Key Words: Speech Recognition, Artificial Intelligence,
Machine Learning, Deep Learning, Human computer
Interaction, Perceptual Linear Programming, Natural
Language Processing, Dynamic time wrap, Hidden
Markov model.
1. INTRODUCTION
Speech recognition or Speech to text is the ability of a
machine or program to identify spoken words on the
external side and convert them in the form of text which is
readable. Basic Speech recognition software only have access
to limited vocabulary, words and phrases and can only
identify these speeches only if they spoken clearly. The more
sophisticated and technically sound software have the ability
to accept and process complex quotes, accents and also
languages. Speech recognition incorporates different fields
of research in computer science, linguistics and computer
engineering. Many modern devices may have speech
recognition functions in them to allow for easier or hands -
free use of a device. Speech recognition works using
algorithms through acoustic and language modeling. So,
Speech recognition basically works by breaking down the
audio of a speech recording into individual sounds it then
analyzes each sound by using algorithms to find the most
probable word fit in that language and translating those
sounds into text. This is the basic working of speech
recognition. However, more advanced speech recognition
software make the use AI and ML. These systems will use
grammar, structure, syntax as well as composition of audio
and voice signals in order to process speech. Software using
machine learning will learn more the more it is used, so it
may be easier to learn concepts like accents. Speech
recognition is one of the leading applications of machine
learning.[1]
1.1 Basic Model of Speech Recognition
Speech recognition, also called as Automatic speech
recognition (ASR), computer speech recognition or speech-
to-text, is a capability which enables a program or software to
process human speech into written format. It is basically a
method of active communication between a machine and a
human. Speech recognition is commonly confused with voice
recognition. Speech recognition primarily focuses on
converting a human speech into written text whereas, voice
recognition focuses on identifying an individual’s voice. Many
speech recognition devices are available. Speech recognition
uses many interdisciplinary technologies ranging from
Pattern recognition, Signal processing, Natural language
processing implementing to unified statistical framework.
They integrate grammar, syntax and composition of audio
and voice signals to understand and process human speech.
Over time, the software and programs learn the usual
behaviors and requirements of the user and evolve after
every interaction. This is where machine learning comes into
effect. Majority of the good systems allow the companies or
the users to customize and adapt the technology as per their
requirement. [2]
1.2 Basic Speech Recognition System
The basic speech recognition system goes as follows.