The Distress Analysis Interview Corpus of human and computer interviews Jonathan Gratch, Ron Artstein, Gale Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, David Traum, Skip Rizzo, Louis-Philippe Morency USC Institute for Creative Technologies, 12015 Waterfront Drive, Playa Vista CA 90094-2536, USA {gratch, artstein, lucas, stratou, scherer, nazarian, rwood, boberg, devault, marsella, traum, rizzo, morency}@ict.usc.edu Abstract The Distress Analysis Interview Corpus (DAIC) contains clinical interviews designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post traumatic stress disorder. The interviews are conducted by humans, human controlled agents and autonomous agents, and the participants include both distressed and non-distressed individuals. Data collected include audio and video recordings and extensive questionnaire responses; parts of the corpus have been transcribed and annotated for a variety of verbal and non-verbal features. The corpus has been used to support the creation of an automated interviewer agent, and for research on the automatic identification of psychological distress. Keywords: multimodal corpora, virtual humans, dialogue systems, nonverbal behavior 1. Overview Untreated mental illness creates enormous social and eco- nomic costs, yet many cases go undiagnosed. Up to half of patients with psychiatric disorders are not recognized as having mental illness by their primary care physicians (Higgins, 1994). Within health-care settings, a first step in identifying mental illness is a semi-structured clinical inter- view, where health-care providers ask a series of questions aimed at identifying clinical symptoms in an open-ended fashion. Recently, there is considerable research interest in developing tools to analyze the verbal and nonverbal con- tent of these interviews as a means for building decision- support tools (Gratch et al., 2013) and computer-assisted self-administered screenings (Bickmore et al., 2005), and for answering fundamental questions about language, non- verbal behavior and mental illness (Scherer et al., 2013b; Yang et al., 2013; Alvarez-Conrad et al., 2001). The Distress Analysis Interview Corpus (DAIC) is a multi- modal collection of semi-structured clinical interviews. De- signed to simulate standard protocols for identifying people at risk for post-traumatic stress disorder (PTSD) and ma- jor depression, these interviews were collected as part of a larger effort to create a computer agent that interviews peo- ple and identifies verbal and nonverbal indicators of men- tal illness (DeVault et al., 2014). The corpus contains four types of interviews: Face-to-face interviews between participants and a human interviewer (Figure 1); Teleconference interviews, conducted by a human inter- viewer over a teleconferencing system; Wizard-of-Oz interviews, conducted by an animated vir- tual interviewer called Ellie (Figure 2), controlled by a human interviewer in another room; Automated interviews, where participants are interviewed by Ellie operating as an agent in a fully automated mode. Sample interview excerpts are shown in Figure 3. Figure 1: Face-to-face interview setup. Figure 2: Ellie, the virtual interviewer. Participants are drawn from two distinct populations living in the Greater Los Angeles metropolitan area – veterans of the U.S. armed forces and from the general public – and are coded for depression, PTSD and anxiety based on accepted psychiatric questionnaires. Besides informing the development of computer-assisted interviews that improve rates of diagnosis, the corpus has been used to examine several fundamental questions about language, nonverbal behavior, psychophysiology and human-computer interaction. This article describes the de- 3123