J Multimodal User Interfaces (2014) 8:5–16 DOI 10.1007/s12193-013-0133-0 ORIGINAL PAPER Using unlabeled data to improve classiﬁcation of emotional states in human computer interaction Martin Schels · Markus Kächele · Michael Glodek · David Hrabal · Steffen Walter · Friedhelm Schwenker Received: 5 April 2013 / Accepted: 16 November 2013 / Published online: 6 December 2013 © OpenInterface Association 2013 Abstract The individual nature of physiological measure- ments of human affective states makes it very difﬁcult to transfer statistical classiﬁers from one subject to another. In this work, we propose an approach to incorporate unlabeled data into a supervised classiﬁer training in order to conduct an emotion classiﬁcation. The key idea of the method is to conduct a density estimation of all available data (labeled and unlabeled) to create a new encoding of the problem. Based on this a supervised classiﬁer is constructed. Further, numerical evaluations on the EmoRec II corpus are given, examining to what extent additional data can improve classiﬁcation and which parameters of the density estimation are optimal. Keywords Partially supervised learning · Clustering · Affective computing 1 Introduction and related work 1.1 Partially supervised learning There is a variety of recent research on techniques of partially supervised learning that make use of labeled data together with unlabeled ones to improve automatic classiﬁcation. One of the early approaches is to use the well established expec- tation maximization algorithm to better estimate a generative model and use it for classiﬁcation [10]. Most commonly it is done by conducting the following steps [1]: M. Schels (B ) · M. Kächele · M. Glodek · F. Schwenker Institute of Neural Information Processing, Ulm University, Ulm, Germany e-mail: martin.schels@uni-ulm.de D. Hrabal · S. Walter Medical Psychology, Ulm University, Ulm, Germany 1. Estimate an initial model using the labeled data, solely. 2. Label the additional unlabeled samples accordingly. 3. Re-estimate the parameters of the model using the all the data. 4. Check for stopping criteria and proceed to step 2. A further technique to incorporate unlabeled data in clas- siﬁcation approaches is transductive learning [52], where the given test cases are employed as additionally available unlabeled samples. In inductive learning, a separating deci- sion function is explicitly deﬁned on the whole space. In contrast to that, a transductive learner attempts to ﬁnd an optimal assignment of categories only on the given testing data. Hereby the cluster assumption is exploited: It is more likely for a separating decision border to be in low density regions of the space. This means that the classes are clustered together. The most common transductive learning approach is the transductive support vector machine [18], that ﬁnds a low density region by maximization of the margin on all available data, labeled and unlabeled. Other popular techniques are active learning where the most informative sample from the unlabeled data, i.e. the one closest to a precomputed decision boundary, is selected by the algorithm and passed to an expert [9] and the deci- sion boundary is adapted accordingly. In order to con- duct a fully automatic process, semi supervised learning can be applied: Classiﬁers are directly used to annotate the unlabeled data. A classiﬁer can label data for itself by selection the most conﬁdent data samples and add them to the training set (self training) [38]. Another option is to use several classiﬁers in order to mutually select conﬁ- dent samples for the respective training data (co-training) [3]. This expansion of the training set is severely under- mined, when noise (i.e. erroneous classiﬁed samples) is added to the training set. This implies that the initial clas- 123