J Multimodal User Interfaces (2014) 8:5–16 DOI 10.1007/s12193-013-0133-0 ORIGINAL PAPER Using unlabeled data to improve classification of emotional states in human computer interaction Martin Schels · Markus Kächele · Michael Glodek · David Hrabal · Steffen Walter · Friedhelm Schwenker Received: 5 April 2013 / Accepted: 16 November 2013 / Published online: 6 December 2013 © OpenInterface Association 2013 Abstract The individual nature of physiological measure- ments of human affective states makes it very difficult to transfer statistical classifiers from one subject to another. In this work, we propose an approach to incorporate unlabeled data into a supervised classifier training in order to conduct an emotion classification. The key idea of the method is to conduct a density estimation of all available data (labeled and unlabeled) to create a new encoding of the problem. Based on this a supervised classifier is constructed. Further, numerical evaluations on the EmoRec II corpus are given, examining to what extent additional data can improve classification and which parameters of the density estimation are optimal. Keywords Partially supervised learning · Clustering · Affective computing 1 Introduction and related work 1.1 Partially supervised learning There is a variety of recent research on techniques of partially supervised learning that make use of labeled data together with unlabeled ones to improve automatic classification. One of the early approaches is to use the well established expec- tation maximization algorithm to better estimate a generative model and use it for classification [10]. Most commonly it is done by conducting the following steps [1]: M. Schels (B ) · M. Kächele · M. Glodek · F. Schwenker Institute of Neural Information Processing, Ulm University, Ulm, Germany e-mail: martin.schels@uni-ulm.de D. Hrabal · S. Walter Medical Psychology, Ulm University, Ulm, Germany 1. Estimate an initial model using the labeled data, solely. 2. Label the additional unlabeled samples accordingly. 3. Re-estimate the parameters of the model using the all the data. 4. Check for stopping criteria and proceed to step 2. A further technique to incorporate unlabeled data in clas- sification approaches is transductive learning [52], where the given test cases are employed as additionally available unlabeled samples. In inductive learning, a separating deci- sion function is explicitly defined on the whole space. In contrast to that, a transductive learner attempts to find an optimal assignment of categories only on the given testing data. Hereby the cluster assumption is exploited: It is more likely for a separating decision border to be in low density regions of the space. This means that the classes are clustered together. The most common transductive learning approach is the transductive support vector machine [18], that finds a low density region by maximization of the margin on all available data, labeled and unlabeled. Other popular techniques are active learning where the most informative sample from the unlabeled data, i.e. the one closest to a precomputed decision boundary, is selected by the algorithm and passed to an expert [9] and the deci- sion boundary is adapted accordingly. In order to con- duct a fully automatic process, semi supervised learning can be applied: Classifiers are directly used to annotate the unlabeled data. A classifier can label data for itself by selection the most confident data samples and add them to the training set (self training) [38]. Another option is to use several classifiers in order to mutually select confi- dent samples for the respective training data (co-training) [3]. This expansion of the training set is severely under- mined, when noise (i.e. erroneous classified samples) is added to the training set. This implies that the initial clas- 123