*Author for correspondence Indian Journal of Science and Technology, Vol 9(32), DOI: 10.17485/ijst/2016/v9i32/98737, August 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Lip Detection and Lip Geometric Feature Extraction using Constrained Local Model for Spoken Language Identification using Visual Speech Recognition Aparna Brahme 1* and Umesh Bhadade 2 1 Department of Information Technology Engineering, MET’s Institute of Engineering, Nashik - 422207, Maharashtra, India; mrsnrkale@gmail.com 2 Department of IT Engineering, SSBT’s College of Engineering and Technology Bhambori, Jalgaon – 425001, Maharashtra, India; umeshbhadade@rediffmail.com Keywords: CLM, Lip Detection, Language Identification, Visual Speech Abstract Background/Objectives: The aim of our research is to guess the language of spoken utterance by using the cues from visual speech recognition i.e. from movement of lips. The first step towards this task is to detect lips form face image and then to extract various geometric features of lip shape in order to guess the utterance. Methods/Statistical Analysis: This paper presents the methodology for detecting lips from face images using constrained local model (CLM) and then extracting the geometric features of lip shape. The two steps involved in lip detection are CLM model building and CLM search. For extracting lip geometric features, twenty feature points are defined on lips and lip height, width, area are defined using these twenty feature points. Findings: CLM model is build using images from FGnet Talking face video database and tested using images from FGnet Talking face video database and also using other images. The detection accuracy is more for FGnet images as compare to other images. Feature vector defining the lip shape consists of geometric parameters like height, width and area of inner and outer lip contours. Feature vector is calculated for all test images after detecting lips from face image. So the error in detecting lips leads to the error in feature vector. This indicates the speaker dependency of visual speech recognition systems. Application/Improvements: The proposed approach is useful in visual speech recognition for lip detection and feature extraction. Minimizing the speaker dependency and generalizing the approach should be considered for further improvements. 1. Introduction Automatic Language Identification (LID) is the task of recognizing a language of a spoken utterance by a com- puter. Language identification finds many applications in multi-lingual services. An example is the language iden- tification system used to route an incoming telephone call to a human switchboard operator fluent in the cor- responding language. Automatic visual language identification (VLID) is the technology which makes use of visual cues derived from movement of the speech articulators (lip movements) to identify the language of spoken utterance, without using any audio information 1 .is technique is useful particu- larly in noisy environments where audio signal available is very weak or no audio signal is available at all. In our paper 2 an overview of spoken language identification, var- ious language identification cues and basic frame work of visual language identification is discussed. According to the proposed frame work the first task is feature extrac- tion from videos of speech articulators i.e. movements of lips. For this it is necessary to detect Lips from the frames of videos containing face images. is paper discusses the lip detection using con- strained local model. In Section 2 various methods of lip detection are discussed. In Section 3, CLM model build-