A Drowsiness Detection Scheme Based on Fusion of Voice and Vision Cues Anirban Dasgupta, Bibek Kabi, Anjith George, SL Happy, Aurobinda Routray Abstract—Drowsiness level detection of an individual is very important in many safety critical applications such as driving. There are several invasive and contact based methods such as use of blood biochemical, brain signals etc. which can estimate the level of drowsiness very accurately. However, these methods are very difficult to implement in practical scenarios, as they cause discomfort to the user. This paper presents a combined voice and vision based drowsiness detection system well suited to detect the drowsiness level of an automotive driver. The vision and voice based detection, being non-contact methods, has the advantage of their feasibility of implementation. The authenticity of these methods have been cross-validated using brain signals. Keywords—drowsiness; voiced-silenced; PERCLOS; I. INTRODUCTION major cause of road accidents is the loss of attention in automotive drivers, which has been a concern for transportation safety over the years. Hence, on-board monitoring of the alertness level of the driver is necessary to prevent such occurrences. Alertness level in human beings can be assessed using different measures such as Electroencephalogram (EEG)[1], ocular features [2][3][4], blood samples [5], speech [6], skin conductance [7] etc. The EEG and blood bio-chemicals based method has been reported to be the most authentic cue for estimating the reduction in alertness level [8]. However, being a contact based method, its feasibility of implementation becomes impractical[9]. In this paper, image and voice cues have been considered for implementation on virtue of being non-contact based method. Voice based algorithm works on voiced-silenced ratio[6]. The vision based algorithm computes a ratio called PERCLOS[10], which is based on the eye closure rates. The vision based algorithm utilizes the near infra-red (NIR) illumination for night driving. EEG based method have been used for off-board validation of the vision and voice based algorithms. The NIR data was tested using the database reported in [11]. There has been attempts to develop similar systems based on vision such as [12]–[16]. However, these systems only rely on image information, which may be improved by using a multimodal approach. The system reported in this paper improves existing system by incorporating a voice based algorithm, which provides additional information related to drowsiness. The voice based approach has been recently established as a measure of drowsiness[6] and also has been experimentally validated in this work. The alertness monitoring system described in this work consists of a single USB camera having maximum frame rate of 30 fps at a resolution of 640×480 pixels. The camera is placed directly on the steering column just behind the steering wheel to obtain the best view of the driver’s face. The microphone is placed on the dashboard. The embedded processing unit is an x86 architecture, Intel Atom processor based Single Board Computer (SBC). The SBC is powered at 12 V DC from the car supply. The typical current drawn from the input source is approximately 1200 mA. The approximate typical power drawn from the supply is 15 W. A voltage regulator unit, comprising of IC LM317 along with some resistors, a capacitor and an inductor, is used before the input to remove high voltage spikes from the car supply. An NIR lighting arrangement, consisting of a matrix of 3×8 Gallium Arsenide LEDs, is also connected across the same supply in parallel with the Embedded Platform. The NIR module is operated at 10 V DC and draws a typical current of 250 mA. The lighting system is connected through a Light Dependent Resistor (LDR), to automatically switch on the NIR module in the absence of sufficient illumination. A seven inch LED touch screen is used to display the results. A set of speakers are installed for generating voice alarm. The paper is organized as follows. Section II discusses the voice based algorithm. Section III presents the vision based algorithm. The paper is concluded in Section IV. II. VOICE BASED SYSTEM Non obtrusive nature of speech data and its sensor free application makes it advantageous for drowsiness detection over contact-based methods. Moreover, speech is easier to record even under extreme environmental conditions and noisy surroundings like high temperature, high humidity, crowded traffic, motor sound, horn sound etc. In this work, we define loss of alertness in two ways: 1) loss of alertness due to drowsiness and 2) loss of alertness associated with response time. Two sets of experiments were conducted to assess the alertness level. Two different frameworks were developed for alertness assessment. The acoustic fatigue detection schemes require the estimation of various speech parameters which are affected by low alertness level due to fatigue, such as pitch, duration of voiced and unvoiced speech, Linear Predictive Coding (LPC), Linear Predictive Cepstral Coefficients (LPCC)[17], Mel Frequency Cepstral Coefficients (MFCC)[18], [19], formants etc.[20] These estimated features will be fed to a classifier to decide upon the level of alertness. A