CONTENT BASED CLINICAL DEPRESSION DETECTION IN ADOLESCENTS
Lu-Shih Alex Low, Namunu C. Maddage, Margaret Lech, Lisa Sheeber
1
, Nicholas Allen
2
School of Electrical and Computer Engineering, RMIT University, Melbourne 3001, Australia
1
Oregon Research Institute, 1715 Franklin Boulevard, Eugene, Oregon 97403
2
ORYGEN Research Centre and Department of Psychology, University of Melbourne, Melbourne 3010, Australia
lushih.low@student.rmit.edu.au, {namunu.maddage, margaret.lech}@rmit.edu.au, lsheeber@ori.org, nba@unimelb.edu.au
ABSTRACT
This paper studies the effectiveness of speech contents for detecting
clinical depression in adolescents. We also evaluated the perform-
ances of acoustic features such as Mel frequency cepstral coeffi-
cients (MFCC), short time energy (Energy), zero crossing rate
(ZCR) and Teager energy operator (TEO) using Gaussian mixture
models for depression detection. A clinical data set of speech from
139 adolescents, including 68 (49 girls and 19 boys) diagnosed as
clinically depressed, was used in the classification experiments.
Each subject participated in three 20 minutes interactions. The
classification was first performed using the whole data and a
smaller sub-set of data selected based on behavioural constructs
defined by trained human observers (data with constructs). In the
experiments, we found that the MFCC+Energy feature out per-
formed the TEO feature. The results indicated that using the con-
struct based speech contents in the problem solving interactions
(PSI) session improved the detection accuracy. Accuracy was fur-
ther improved by 4% when the gender dependent depression mod-
elling technique was adopted. By using construct based PSI ses-
sion speech content, gender based depression models achieved
65.1% average detection accuracy. Also, for both types of features
(TEO and MFCC), the correct classification rates were higher for
female speakers than for male speakers.
1. INTRODUCTION
The inability to diagnose clinical depression early on in adolescents
aged 13-20 years, can have a serious impact on suffers, including
the risk for suicidal ideation. Strong evidence demonstrates that
most suicides are linked to depressive disorders and symptomatol-
ogy [13].Teen suicide has become a significant public health con-
cern, seeing as how it is one of the leading causes of death in Aus-
tralia. Suicide rates among Australian adolescents have increased
threefold from the 1960s to the 1990s. Although recent statistics
(2006) have shown a dip in the number of youth suicides, it still
ranks suicide as the 15
th
leading cause of death in Australia [1].
Depressed individuals suffer from varying degrees of psychomotor
retardation (slowness) or agitation. Sufferers of depression experi-
ence prolonged periods of hopelessness, anger, guilt, desperation
and loneliness along with, as noted above, a tendency to suicidal
thoughts. Dealing with the issue of depression poses a complex and
challenging task due to the many potential psychological variables.
In an effort to understand and prevent depression and suicide in
adolescents, psychologists have carried out studies based on demo-
graphic profiles, family self-reports and observational data of a pa-
tient in clinical interviews. From these interviews, it has been con-
sistently reported that clinicians observe that the speech of a de-
pressed patient is slow, uniform, monotonous and expressionless
with the patient having the fear of expressing him or herself [12].
During listening tests [4], listeners could perceive differences in
pitch, loudness, speaking rate and articulation of speech recorded
from depressed patients before and after treatment. This has led to
considerable amount of interest in combining psychological assess-
ments with acoustic speech analysis to objectively measure behav-
ioural changes in a patient over time. Any improvement in objective
diagnosis would translate into relevant clinical applications includ-
ing the early detection of depressive condition and the evaluation of
treatment outcome. Therefore, this is the basis of our research. In
turn, this could lead to the possibility of developing a computerized
healthcare system that would assist mental health professionals by
providing early warning-signs indicating whether a patient is likely
to be depressed through their voice patterns.
As early as the 19
th
century, attempts have been made to analyse
vocal acoustic parameters by finding potential indicators of depres-
sion [16]. Since then, there have been numerous efforts to empiri-
cally determine the physical and mental health of individuals
through their vocal speech patterns. In fact, designing an automatic
computerized system for depressive screening in speech is not a
novel idea [21]. Currently however, there has not been any comput-
erized vocal diagnosis tools that can provide accurate results to as-
sist psychologists in detecting clinical depression among adoles-
cents. The most commonly studied parameters in speaker charac-
terization pertinent to the literature have been the measures relating
to prosodic (i.e. Fundamental frequency (F0), speaking rate, energy)
and the vocal tract (i.e. formants) [6], [11], [20], [14], [5], [15]. This
is due to the fact that they have the closest relation to human percep-
tion [11]. Unfortunately, to make issues more complicated, there
have been discrepancies in results presented from one researcher to
another. Although most researchers [20], [14] found that F0 corre-
lated well with depression, France et al. (2000) [5] experiments on
severely depressed and near-term suicidal subjects with gender
separation found that F0 was an ineffective discriminator for both
depressed male and female patients. Instead, formants and power
spectral density measurements proved to be the better discrimina-
tors. This could be due to the many different variables such as re-
cording conditions, number of participants and the level of partici-
pant’s depression ratings.
Performing multivariate analyses on vocal features extracted
from a patient’s speech has been the main focus in recent studies in
order to increase the accuracy of classification in clinical depression
[5], [15], [11]. Highest classification accuracy achieved up to date
has been presented by Moore et al. (2008) [11]. On a sample data
size of 33 subjects (15 major depress, 18 controls), Moore adopted a
feature selection strategy by adding one feature at a time to find the
highest classification accuracy through quadratic discriminant
analysis and obtained a classification accuracy of 90% and 96%
with the combination of prosodic and glottal features for male and
female patients respectively. However, the sample data size for the
experiments conducted may be deemed too small for creating statis-
tically significant results for clinical application.
17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009
© EURASIP, 2009 2362