mm13-24 1 Abstract— The management of stress in geographically dispersed organisations requires a mature approach to the use of audio visual communications technologies. Good quality audio visual samples help in the development of approaches and tools. This study uses subjective evaluators to rate perception of stress in video samples created using the Lombard effect. It finds that there is a significant difference in the perception of the facial expressions of actors recording speech created when the Lombard effect is applied, compared to recordings created when it is not. This finding means that valid samples can now be more easily created to test telecommunications based approaches to the remote management of stress.. Index Terms— Emotion Recognition, Lombard, Natural vs. Acted Data, Subjective Evaluation, Videoconferencing, Visual Communication I. INTRODUCTION HIS Corporate organisations are increasingly concerned with the impact of individual stress on productivity; this is measured in a number of ways including the number of days lost due to stress related absence and employee satisfaction surveys. Organisations are also increasingly embracing technology to work in geographically dispersed teams, creating challenges for the management of individuals, which has in the past been a mainly face to face activity. In the absence of face to face meetings new technology is emerging to enhance organisational relationships over distance. This includes enhancing audio communication with video channels. The practical measurement of the effectiveness of video channels in the perception of stress is dependent on being able to vary the level of stress in an actor, and measure the remote perception of that stress. Acted stress and natural stress have been found to differ in a number of ways, for example the inclination of a natural actor to smile when stressed does not generally occur in acted stress [1]. Having a controllable way of inducing natural stress effects is useful in experiments; in creating recording samples, real time experiments, and the training of emotional characteristic analysis software. Training in this context means the calibration of software using samples. The Lombard effect has already proven to provide vocal stress effects suitable for experiments as described below; this paper considers whether this functionality extends into the video domain. The Lombard effect is the name given to the elevation of the voice by people with poor hearing, and by those whose hearing is intact but is in the middle of intense noise (e.g. on a railroad). The elevation of the voice is both for the speaker to hear themselves better, and to make them heard to the listener [2]. One use for the Lombard effect is the creation of natural stress in actors for the production of samples. One of the most widely used voice stress training databases is SUSAS [3], maintained by the Linguistic Data Consortium, which includes samples recorded by aircraft crew whilst in the proximity of active jet engines. This database has been used in many vocal stress feature classification experiments including Hansen & Womack [4] who conclude that stress produces several vocal characteristics, ranging from an impact on basic features such as voice amplitude and pitch frequency to more subtle changes in formant energy and spectral tilt. Existing research in audio analysis explores the detailed characteristics of Lombard in the body’s production of speech [5][6][7][8]. The production of greater volume and amplitude has an impact on the working of facial muscles; this was measured by Yehia, Rubin and Bateson [9] using LED face markers, and by Garnier et al [10] who measured lip spreading and aperture on audio visual recordings and noted hyper- articulation caused by the Lombard effect. In automatic speech recognition Heracleous et al [11] found a lip parameter extraction tool OKAO Vision improved the performance of automated lip reading systems when used to train the system to recognise Lombard conditions. For the speaker to be better understood in a noisy environment the natural muscle change caused by speaking louder may be complimented by subconsciously acted exaggerated facial movements to assist the listener with lip reading. Movement sensors have been used to detect exaggerated facial and head movements [12][13][14] and these movements have been shown to increase the accuracy of subjective word recognition tests [15]. Whilst this technical research proves that Lombard measurably affects facial movements, none address whether that movement has an impact on the level of perceived emotion. This experiment tests whether human evaluation of video prepared under silent and noisy conditions reveals significant differences in the perception of stress. II. METHOD A. Preparation of Samples This study used a sound recording suite, with adequate lighting, autocue, and an HD camera recording 1440x1080 pixels, 25000kb/s and 25 frames per second with 384kb/s Subjective Perception of Facial Expression of Stress Created Using the Lombard Effect Charles Ray, Nadia Berthouze, and Andrew Davis. T