INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616 4378 IJSTR©2020 www.ijstr.org Lip-Reading Techniques: A Review Sooraj V, Hardhik M, Nishanth S Murthy, Sandesh C, Shashidhar R Abstract: Lip reading is a skill of determining a person’s words by watching lip movements without having heard the sound, or in other words it is a method of determining speech by looking at the movements of the lips. Audio visual speech recognition (AVSR) is an approach that uses image- processing abilities in lip-reading to assist speech recognition systems. It is combination of both audio part and visual part, which implies integration of both lip-reading and speech recognition processes working separately. In this paper, we go through different methods of lip reading and discuss the steps involved in lip reading which includes face detection, lip localization followed by feature extraction and recognition. Audio-visual speech recognition is helpful in an area having audio noise. We look out for performance of hybrid models used for AVSR and trace out the limitations of different approaches which may be helpful for further research in this field. We compare and analyze with various databases of AVSR and their functions, and also discuss the challenges faced, and extend our perceptivity into direction of future research for different types of lip-reading. Index Terms: Lip-reading, image-processing, face detection, lip localization, feature extraction, audio-visual speech recognition (AVSR), Hybrid Models. —————————— —————————— 1 INTRODUCTION In recent trends, pattern-recognition has proved to be an important topic of discussion which emphasizes on the use of computers to mimic people’s ideas regarding different items to convey some valuable information. When matched with other recognition systems such as fingerprint, gesture or facial recognition, audio visual speech recognition is more beneficial and robust which makes it important building block of Human Computer interface [22,23]. The other important areas of research in lip-reading are pattern recognition [24,25], image processing and computer-vision [26]. Nowadays, lip reading is becoming very important technique implemented in recognition systems where several lip-reading techniques may be used to improve performance of recognition models. Lip reading finds great applications in the field of information security [27,28], speech recognition[29,30,31] and driver assistance systems[32]. Looking at history of lip reading, we will have to go back to 1954 when Sumby[33] proposed his first work associated with lip reading. Later Petajan[34] introduced a different lip contour reading system which was popular in 1980s. After that there has been a number of researches in the field of lip- reading. Since audio signal is susceptible to noise in the environment, a pixel based method combined with artificial neural network (ANN) was proposed in a recognition model[35] developed in 1989. In 1993, Goldschen and others used Hidden Markov Models (HMMs) in their lip reading systems to achieve sentence recognition rate of 25%[36]. Chiou[37] gave a lip-reading system which used colour motion-video combining snake model, HMM and principal component analysis (PCA) to achieve accuracy of about 94% for 10 words. For improving performance of continuous lip-reading, a context-based deep neural networks (DNN) system[38] was realized with many layers for visual entities to achieve word accuracy of about 84.7% with a massive 33% increase when compared to baseline HMM. A number of companies and institutions are investing on researches in the field of lip reading. Haar feature and Adaboost cascade classifiers [39] were used to detect the facial gestures and lip movements of the speaker in an open source system invented by Intel. This system has got the ability to enhance word recognition accuracy and processing speed. A kind of computer for lip reading was designed to differentiate between various languages such as German, Arabic, Italian, Polish etc with great accuracy. Google and Oxford universities have discovered tremendous lip-reading software based on artificial intelligence which may be known to find out the lip movements of the speaker on BBC-TV shows. It turned out to be great with 46.8% accuracy when compared to trained lip specialist which was merely 12.8% in a similar test. The organization of the paper is as follows: Section 2-Lip-reading system, Section 3-Database and Section 4-Conclusion. 2 LIP-READING SYSTEM The existing lip-reading system emphasizes on face detection, lip localization, followed by feature extraction and recognition blocks as shown in Figure 1. After identifying speakers face, lip region has to be found and then information has to be extracted by analyzing movements of the lips. Figure 1: Block diagram of existing lip-reading system Here, first step is to detect the face of the speaker and identify the region of the lips. Next step is to minimize the image data and extract the feature related to movements of the lips. The last step would be to identify the visual data from the extracted lip movement and classify it using a high efficiency classifier. __________________________________ Sooraj V is pursuing Bachelor of Engineering in Department of Electronics and Communication, JSS Science and Technology University, Mysuru-570006, India. His research interests include speech signal processing and image processing. Hardhik M is pursuing Bachelor of Engineering in Department of Electronics and Communication, JSS Science and Technology University, Mysuru-570006, India. His research interests include digital signal processing and programming using java Nishanth S Murthy is pursuing Bachelor of Engineering in Department of Electronics and Communication, JSS Science and Technology University, Mysuru-570006, India. His research interests include CMOS VLSI circuits and digital signal processing.