International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 10 | Oct 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 947 Speech Recognition: General Idea and Overview Kewal Mehta 1 , Amitesh Dubey 2 , Rahul Kalsariya 3 , Viraj Prajapati 4 , Prof. Suvarna Pansambal 5 1,2,3,4 BE, Department of Computer Engineering, Atharva College of Engineering, Mumbai, India 5 Head of Department, Department of Computer Engineering, Atharva College of Engineering, Mumbai, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Voice recognition system is a software that lets the user control computer functions and helps dictate various texts. This paper presents a general idea and an overview of Speech Recognition. We discuss about the general working of speech recognition and some popular algorithms that are used in modern day speech recognition devices. Speech is something that is used everyday and is the most common means of communication between humans, but nowadays humans are not restricted to communicating with just humans, communication of humans with machines is also possible nowadays due to the advancements in the fields of Artificial Intelligence, Machine Learning and Deep learning. This interaction between humans and computers is done using different interfaces, this is termed as human computer interaction(HCI). This paper primarily focuses on basic working and most recognized algorithms of speech recognition which is one of the most important domain in the field of artificial intelligence. The paper also gives detailed knowledge about the various steps of a basic speech recognition system such as pre-processing, feature extraction and reorganization. The paper also gives detailed explanation of various algorithms that are very popular in the field of artificial intelligence such as PLP(Perceptual linear programming), NLP(Natural language processing), DTW(Dynamic time wrap), HMM(Hidden Markov model), N- grams and shows ways to implement these algorithms in the speech recognition devices. Key Words: Speech Recognition, Artificial Intelligence, Machine Learning, Deep Learning, Human computer Interaction, Perceptual Linear Programming, Natural Language Processing, Dynamic time wrap, Hidden Markov model. 1. INTRODUCTION Speech recognition or Speech to text is the ability of a machine or program to identify spoken words on the external side and convert them in the form of text which is readable. Basic Speech recognition software only have access to limited vocabulary, words and phrases and can only identify these speeches only if they spoken clearly. The more sophisticated and technically sound software have the ability to accept and process complex quotes, accents and also languages. Speech recognition incorporates different fields of research in computer science, linguistics and computer engineering. Many modern devices may have speech recognition functions in them to allow for easier or hands - free use of a device. Speech recognition works using algorithms through acoustic and language modeling. So, Speech recognition basically works by breaking down the audio of a speech recording into individual sounds it then analyzes each sound by using algorithms to find the most probable word fit in that language and translating those sounds into text. This is the basic working of speech recognition. However, more advanced speech recognition software make the use AI and ML. These systems will use grammar, structure, syntax as well as composition of audio and voice signals in order to process speech. Software using machine learning will learn more the more it is used, so it may be easier to learn concepts like accents. Speech recognition is one of the leading applications of machine learning.[1] 1.1 Basic Model of Speech Recognition Speech recognition, also called as Automatic speech recognition (ASR), computer speech recognition or speech- to-text, is a capability which enables a program or software to process human speech into written format. It is basically a method of active communication between a machine and a human. Speech recognition is commonly confused with voice recognition. Speech recognition primarily focuses on converting a human speech into written text whereas, voice recognition focuses on identifying an individual’s voice. Many speech recognition devices are available. Speech recognition uses many interdisciplinary technologies ranging from Pattern recognition, Signal processing, Natural language processing implementing to unified statistical framework. They integrate grammar, syntax and composition of audio and voice signals to understand and process human speech. Over time, the software and programs learn the usual behaviors and requirements of the user and evolve after every interaction. This is where machine learning comes into effect. Majority of the good systems allow the companies or the users to customize and adapt the technology as per their requirement. [2] 1.2 Basic Speech Recognition System The basic speech recognition system goes as follows.