Advanced Research in Electrical and Electronic Engineering
p-ISSN: 2349-5804; e-ISSN: 2349-5812 Volume 2, Issue 14 October-December, 2015, pp. 42-45
© Krishi Sanskriti Publications
http://www.krishisanskriti.org/Publication.html
Visual Speech Recognition through
Zernike Moments
Promila Singh
1
, A.N Mishra
2
and Usha Sharma
3
1
M.tech Student, Department of Electronics and Communication Engineering, Krishna Engineering College, Ghaziabad, U.P
2
Department of Electronics and Communication Engineering, Krishna Engineering College, Ghaziabad, U.P
3
Department of Electronics and Communication Engineering, JRE Group Of Institution, Greater Noida, U.P
E-mail:
1
promisingh22@gmail.com,
2
an_mishra53@rediffmail.com,
3
ushasharma1529@gmail.com
Abstract—This paper presents a new learning- based representation
that is referred to as Visual Speech Recognition through Zernike
Moments. The automated recognition of human speech using only
features from the visual domain has become a significant research
topic that plays an essential role in the development of many
multimedia systems such as audio visual speech recognition (AVSR),
mobile phone applications, human - computer interaction (HCI) and
sign language recognition. The inclusion of the lip visual information
is opportune since it can improve overall accuracy of audio or hand
recognition algorithms especially when such systems are operated in
environments characterized by a high level of acoustic noise. The
main components of the developed Visual Speech Recognition system
are applied to: (a) segment the mouth region of interest, (b) extract
the visual features from the real time input video image and (c) to
identify the Zernike moments. The major difficulty associated with the
VSR systems resides in the identification of the smallest elements
contained in the image sequences that represent the lip movements in
the visual domain. The objective of visual speech recognition system
is to improve their recognition accuracy. In this paper we computed
visual features using Zernike moments on visual vocabulary of
independent standard words dataset of Hindi digits of ten speakers.
The visual features were normalized and dimension of features set
was reduced by principal component analysis (PCA) in order to
recognize the isolated word utterance on PCA space .
1. INTRODUCTION
In the recent year, there are many automatic speech reading
system proposed that combine audio as well as visual speech
features. In computer speech recognition visual component of
speech is used for support of acoustic speech recognition.
Design of an audio- visual speech recognizer is based on
human lip- reading expert experiences. Hearing impaired
people achieve recognition rate of 60–80 % in dependence on
lip reading conditions. Most important conditions for good lip-
reading are quality of visual speech of a speaker (proper
articulation) and angle of view. Sometimes people, who are
well understood from acoustic component may be not well lip-
read but for hearing impaired or even deaf people visual
speech component is important source of information. Lip-
reading (visual speech recognition) is used by people without
disabilities, too. It helps better understanding in case when the
acoustic speech is less intelligible. Task of automatic speech
recognition by a computer, when visual component of speech
is used has attracted many researcher to contribute in
automatic Audio-Visual Speech Recognition domain. This is
challenging because of the visual articulations vary with
speaker to speaker and can contain very less information as
compared to acoustic signal therefore identification of robust
features is still center of attraction of many researchers.
In recent year, there have been many advances in automatic
speech reading system with the inclusion of audio and visual
speech features to recognize words under noisy conditions.
The objective of visual speech recognition system is to
improve recognition accuracy. In paper we extract visual
features using Zernike moments on visual vocabulary of
independent standard dataset of Hindi digits of ten speakers.
The visual features were normalized and dimension of features
set was reduced by principal component analysis (PCA) in
order to recognize the isolated word utterance on PCA space.
2. 2. ZERNIKE POLYNOMIALS
Set of orthogonal polynomials defined on the unit disk.
(, )=
()
Often, to aid in the interpretation of optical test results it is
convenient to express wavefront data in polynomial form.
Zernike polynomials are often used for this purpose since they
are made up of terms that are of the same form as the types of
aberrations often observed in optical tests (Zernike, 1934).
This is not to say that Zernike polynomials are the best
polynomials for fitting test data. Sometimes Zernike
polynomials give a poor representation of the wavefront data.
For example, Zernikes have little value when air turbulence is
present. Likewise, fabrication errors in the single point
diamond turning process cannot be represented using a
reasonable number of terms in the Zernike polynomial. In the
testing of conical optical elements, additional terms must be
added to Zernike polynomials to accurately represent
alignment errors. The blind use of Zernike polynomials to