INFLUENCE OF ACOUSTIC LOW-LEVEL DESCRIPTORS IN THE DETECTION OF CLINICAL DEPRESSION IN ADOLESCENTS Lu-Shih Alex Low, Namunu C. Maddage, Margaret Lech, Lisa Sheeber † , Nicholas Allen †† School of Electrical and Computer Engineering, RMIT University, Melbourne 3001, Australia † Oregon Research Institute, 1715 Franklin Boulevard, Eugene, Oregon 97403 †† ORYGEN Research Centre and Department of Psychology, University of Melbourne, Melbourne 3010, Australia lushih.low@student.rmit.edu.au, {namunu.maddage, margaret.lech}@rmit.edu.au, lsheeber@ori.org, nba@unimelb.edu.au ABSTRACT In this paper, we report the influence that classification accuracies have in speech analysis from a clinical dataset by adding acoustic low-level descriptors (LLD) belonging to prosodic (i.e. pitch, formants, energy, jitter, shimmer) and spectral features (i.e. spectral flux, centroid, entropy and roll-off) along with their delta (ｨ) and delta-delta (ｨ-ｨ) coefficients to two baseline features of Mel frequency cepstral coefficients and Teager energy critical- band based autocorrelation envelope. Extracted acoustic low-level descriptors (LLD) that display an increase in accuracy after being added to these baseline features were finally modeled together using Gaussian mixture models and tested. A clinical data set of speech from 139 adolescents, including 68 (49 girls and 19 boys) diagnosed as clinically depressed, was used in the classification experiments. For male subjects, the combination of (TEO-CB- Auto-Env + ｨ + ｨ-ｨ) + F0 + (LogE + ｨ + ｨ-ｨ) + (Shimmer + ｨ) + Spectral Flux + Spectral Roll-off gave the highest classification rate of 77.82% while for the female subjects, using TEO-CB-Auto- Env gave an accuracy of 74.74%. Index Terms— Clinical depression, prosodic feature, spectral feature, acoustic features, Gaussian Mixture Model 1. INTRODUCTION Understanding the causes of clinical depression has long been a complex and challenging task particularly in the field of psychology due to the many potential psychological variables. With the advancements in technology of high speed computers, psychologists in recent years have been trying to collaborate with different disciplinary fields to better understand the psychological factors relating to the development of clinical depression. All of which has the main outcome of trying to share each others expertise to further assist psychologists in making additional contributions towards the prevention and treatment of clinical depression. The different research fields include: 1) cognitive neuroscience which introduces neuro-imaging techniques such as functional magnetic resonance imaging (fMRI) to monitor the cognitive patterns relating to the activity of the brain, 2) facial recognition which analyses the facial expressions displayed from various emotions and 3) speech and language processing which objectively analyses the vocal patterns of human speech. This paper focuses solely on the latter, whereby it has been well documented that the voice of depressed individuals are slow, uniform, monotonous and expressionless with the person having the fear of expressing himself or herself [11] [12]. As a consequence, our study takes a look at the objective assessment of a subject’s speaking behavior and vocal characteristics to identify any differences in acoustic speech measures between depressed and control subjects. According to [6] there is strong evidence that demonstrates most suicides are linked to depressive disorders and symptomatology. Depressive disorders are associated with a range of psychosocial impairments and comorbid symptomatology which includes varying degrees of psychomotor retardation (slowness) or agitation. Although statistics have shown that the numbers of suicides in Australia have decreased in recent years following the peaks in 1997-1998, suicide still remains a leading cause of death (in 2006: ranked 15 th ), and is greater than the number of deaths from transport accidents making it a prominent public health concern [1]. Studies show that almost half of all suicides are in the age group of 25-49 years. Therefore, by focusing our attention in the intervention of depression at a young age, it could be easier to treat and eradicate the problem before it is too late. The resulting conclusion is that late-life depression is a chronic or recurring disorder, and when goes unrecognized, may have devastating effects. There have been several studies published on various methods in objectively analyzing vocal parameters as possible cues to clinical depression. The most commonly used speech processing techniques in the recognition of emotions and clinical depression in the literature are related to prosody (i.e. pitch, jitter, energy, pause time and speaking rate), as well as spectral feature (i.e. formants) and cepstral features (i.e. Mel-frequency cepstral coefficients). Prosodic information which has the closest relation to the expressiveness of speech has been widely studied in this field. There have been a number of studies that have consistently shown an increase in speech rate and loudness, as well as a decrease in pause time duration in clinical interviews that are key discriminators of mood improvement over the course of therapy [3], [7]. Fundamental frequency (F 0 ), the most widely studied parameter, has also shown a strong correlation to depression. Unfortunately, the generalizability of some of these findings still remains unclear as reported results vary from one investigator to another. In our particular study, the purpose of this paper was to examine the influence that classification accuracy (increase or decrease) will have in inserting acoustic low-level descriptors representing prosodic and spectral features to our set of baseline features: 1) Mel-frequency cepstral coefficients (MFCC) and 2) Teager energy operator critical-band based autocorrelation envelope (TEO-CB-Auto-Env). The reason behind using these two methods as the baseline is that MFCCs have been widely used in speech content analysis and are known to be robust acoustic feature. TEO-CB-Auto-Env method on the other hand, have performed reliably well in emotional stress classification [10]. Finally, we use a different combination of baseline features and 5154 978-1-4244-4296-6/10/$25.00 ©2010 IEEE ICASSP 2010