INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616
4378
IJSTR©2020
www.ijstr.org
Lip-Reading Techniques: A Review
Sooraj V, Hardhik M, Nishanth S Murthy, Sandesh C, Shashidhar R
Abstract: Lip reading is a skill of determining a person’s words by watching lip movements without having heard the sound, or in other words it is a
method of determining speech by looking at the movements of the lips. Audio visual speech recognition (AVSR) is an approach that uses image-
processing abilities in lip-reading to assist speech recognition systems. It is combination of both audio part and visual part, which implies integration of
both lip-reading and speech recognition processes working separately. In this paper, we go through different methods of lip reading and discuss the
steps involved in lip reading which includes face detection, lip localization followed by feature extraction and recognition. Audio-visual speech recognition
is helpful in an area having audio noise. We look out for performance of hybrid models used for AVSR and trace out the limitations of different
approaches which may be helpful for further research in this field. We compare and analyze with various databases of AVSR and their functions, and
also discuss the challenges faced, and extend our perceptivity into direction of future research for different types of lip-reading.
Index Terms: Lip-reading, image-processing, face detection, lip localization, feature extraction, audio-visual speech recognition (AVSR), Hybrid Models.
—————————— ——————————
1 INTRODUCTION
In recent trends, pattern-recognition has proved to be an
important topic of discussion which emphasizes on the use of
computers to mimic people’s ideas regarding different items to
convey some valuable information. When matched with other
recognition systems such as fingerprint, gesture or facial
recognition, audio visual speech recognition is more beneficial
and robust which makes it important building block of Human
Computer interface [22,23]. The other important areas of
research in lip-reading are pattern recognition [24,25], image
processing and computer-vision [26]. Nowadays, lip reading is
becoming very important technique implemented in recognition
systems where several lip-reading techniques may be used to
improve performance of recognition models. Lip reading finds
great applications in the field of information security [27,28],
speech recognition[29,30,31] and driver assistance
systems[32]. Looking at history of lip reading, we will have to
go back to 1954 when Sumby[33] proposed his first work
associated with lip reading. Later Petajan[34] introduced a
different lip contour reading system which was popular in
1980s. After that there has been a number of researches in the
field of lip- reading. Since audio signal is susceptible to noise
in the environment, a pixel based method combined with
artificial neural network (ANN) was proposed in a recognition
model[35] developed in 1989. In 1993, Goldschen and others
used Hidden Markov Models (HMMs) in their lip reading
systems to achieve sentence recognition rate of 25%[36].
Chiou[37] gave a lip-reading system which used colour
motion-video combining snake model, HMM and principal
component analysis (PCA) to achieve accuracy of about 94%
for 10 words.
For improving performance of continuous lip-reading, a
context-based deep neural networks (DNN) system[38] was
realized with many layers for visual entities to achieve word
accuracy of about 84.7% with a massive 33% increase when
compared to baseline HMM. A number of companies and
institutions are investing on researches in the field of lip
reading. Haar feature and Adaboost cascade classifiers [39]
were used to detect the facial gestures and lip movements of
the speaker in an open source system invented by Intel. This
system has got the ability to enhance word recognition
accuracy and processing speed. A kind of computer for lip
reading was designed to differentiate between various
languages such as German, Arabic, Italian, Polish etc with
great accuracy. Google and Oxford universities have
discovered tremendous lip-reading software based on artificial
intelligence which may be known to find out the lip movements
of the speaker on BBC-TV shows. It turned out to be great
with 46.8% accuracy when compared to trained lip specialist
which was merely 12.8% in a similar test. The organization of
the paper is as follows: Section 2-Lip-reading system, Section
3-Database and Section 4-Conclusion.
2 LIP-READING SYSTEM
The existing lip-reading system emphasizes on face detection,
lip localization, followed by feature extraction and recognition
blocks as shown in Figure 1. After identifying speakers face, lip
region has to be found and then information has to be
extracted by analyzing movements of the lips.
Figure 1: Block diagram of existing lip-reading system
Here, first step is to detect the face of the speaker and identify
the region of the lips. Next step is to minimize the image data
and extract the feature related to movements of the lips. The
last step would be to identify the visual data from the extracted
lip movement and classify it using a high efficiency classifier.
__________________________________
Sooraj V is pursuing Bachelor of Engineering in Department of
Electronics and Communication, JSS Science and Technology
University, Mysuru-570006, India. His research interests include
speech signal processing and image processing.
Hardhik M is pursuing Bachelor of Engineering in Department of
Electronics and Communication, JSS Science and Technology
University, Mysuru-570006, India. His research interests include
digital signal processing and programming using java
Nishanth S Murthy is pursuing Bachelor of Engineering in
Department of Electronics and Communication, JSS Science and
Technology University, Mysuru-570006, India. His research
interests include CMOS VLSI circuits and digital signal processing.