CONTENT BASED CLINICAL DEPRESSION DETECTION IN ADOLESCENTS Lu-Shih Alex Low, Namunu C. Maddage, Margaret Lech, Lisa Sheeber 1 , Nicholas Allen 2 School of Electrical and Computer Engineering, RMIT University, Melbourne 3001, Australia 1 Oregon Research Institute, 1715 Franklin Boulevard, Eugene, Oregon 97403 2 ORYGEN Research Centre and Department of Psychology, University of Melbourne, Melbourne 3010, Australia lushih.low@student.rmit.edu.au, {namunu.maddage, margaret.lech}@rmit.edu.au, lsheeber@ori.org, nba@unimelb.edu.au ABSTRACT This paper studies the effectiveness of speech contents for detecting clinical depression in adolescents. We also evaluated the perform- ances of acoustic features such as Mel frequency cepstral coeffi- cients (MFCC), short time energy (Energy), zero crossing rate (ZCR) and Teager energy operator (TEO) using Gaussian mixture models for depression detection. A clinical data set of speech from 139 adolescents, including 68 (49 girls and 19 boys) diagnosed as clinically depressed, was used in the classification experiments. Each subject participated in three 20 minutes interactions. The classification was first performed using the whole data and a smaller sub-set of data selected based on behavioural constructs defined by trained human observers (data with constructs). In the experiments, we found that the MFCC+Energy feature out per- formed the TEO feature. The results indicated that using the con- struct based speech contents in the problem solving interactions (PSI) session improved the detection accuracy. Accuracy was fur- ther improved by 4% when the gender dependent depression mod- elling technique was adopted. By using construct based PSI ses- sion speech content, gender based depression models achieved 65.1% average detection accuracy. Also, for both types of features (TEO and MFCC), the correct classification rates were higher for female speakers than for male speakers. 1. INTRODUCTION The inability to diagnose clinical depression early on in adolescents aged 13-20 years, can have a serious impact on suffers, including the risk for suicidal ideation. Strong evidence demonstrates that most suicides are linked to depressive disorders and symptomatol- ogy [13].Teen suicide has become a significant public health con- cern, seeing as how it is one of the leading causes of death in Aus- tralia. Suicide rates among Australian adolescents have increased threefold from the 1960s to the 1990s. Although recent statistics (2006) have shown a dip in the number of youth suicides, it still ranks suicide as the 15 th leading cause of death in Australia [1]. Depressed individuals suffer from varying degrees of psychomotor retardation (slowness) or agitation. Sufferers of depression experi- ence prolonged periods of hopelessness, anger, guilt, desperation and loneliness along with, as noted above, a tendency to suicidal thoughts. Dealing with the issue of depression poses a complex and challenging task due to the many potential psychological variables. In an effort to understand and prevent depression and suicide in adolescents, psychologists have carried out studies based on demo- graphic profiles, family self-reports and observational data of a pa- tient in clinical interviews. From these interviews, it has been con- sistently reported that clinicians observe that the speech of a de- pressed patient is slow, uniform, monotonous and expressionless with the patient having the fear of expressing him or herself [12]. During listening tests [4], listeners could perceive differences in pitch, loudness, speaking rate and articulation of speech recorded from depressed patients before and after treatment. This has led to considerable amount of interest in combining psychological assess- ments with acoustic speech analysis to objectively measure behav- ioural changes in a patient over time. Any improvement in objective diagnosis would translate into relevant clinical applications includ- ing the early detection of depressive condition and the evaluation of treatment outcome. Therefore, this is the basis of our research. In turn, this could lead to the possibility of developing a computerized healthcare system that would assist mental health professionals by providing early warning-signs indicating whether a patient is likely to be depressed through their voice patterns. As early as the 19 th century, attempts have been made to analyse vocal acoustic parameters by finding potential indicators of depres- sion [16]. Since then, there have been numerous efforts to empiri- cally determine the physical and mental health of individuals through their vocal speech patterns. In fact, designing an automatic computerized system for depressive screening in speech is not a novel idea [21]. Currently however, there has not been any comput- erized vocal diagnosis tools that can provide accurate results to as- sist psychologists in detecting clinical depression among adoles- cents. The most commonly studied parameters in speaker charac- terization pertinent to the literature have been the measures relating to prosodic (i.e. Fundamental frequency (F0), speaking rate, energy) and the vocal tract (i.e. formants) [6], [11], [20], [14], [5], [15]. This is due to the fact that they have the closest relation to human percep- tion [11]. Unfortunately, to make issues more complicated, there have been discrepancies in results presented from one researcher to another. Although most researchers [20], [14] found that F0 corre- lated well with depression, France et al. (2000) [5] experiments on severely depressed and near-term suicidal subjects with gender separation found that F0 was an ineffective discriminator for both depressed male and female patients. Instead, formants and power spectral density measurements proved to be the better discrimina- tors. This could be due to the many different variables such as re- cording conditions, number of participants and the level of partici- pant’s depression ratings. Performing multivariate analyses on vocal features extracted from a patient’s speech has been the main focus in recent studies in order to increase the accuracy of classification in clinical depression [5], [15], [11]. Highest classification accuracy achieved up to date has been presented by Moore et al. (2008) [11]. On a sample data size of 33 subjects (15 major depress, 18 controls), Moore adopted a feature selection strategy by adding one feature at a time to find the highest classification accuracy through quadratic discriminant analysis and obtained a classification accuracy of 90% and 96% with the combination of prosodic and glottal features for male and female patients respectively. However, the sample data size for the experiments conducted may be deemed too small for creating statis- tically significant results for clinical application. 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 © EURASIP, 2009 2362