J Multimodal User Interfaces (2014) 8:5–16
DOI 10.1007/s12193-013-0133-0
ORIGINAL PAPER
Using unlabeled data to improve classification of emotional states
in human computer interaction
Martin Schels · Markus Kächele · Michael Glodek ·
David Hrabal · Steffen Walter · Friedhelm Schwenker
Received: 5 April 2013 / Accepted: 16 November 2013 / Published online: 6 December 2013
© OpenInterface Association 2013
Abstract The individual nature of physiological measure-
ments of human affective states makes it very difficult to
transfer statistical classifiers from one subject to another. In
this work, we propose an approach to incorporate unlabeled
data into a supervised classifier training in order to conduct
an emotion classification. The key idea of the method is to
conduct a density estimation of all available data (labeled and
unlabeled) to create a new encoding of the problem. Based on
this a supervised classifier is constructed. Further, numerical
evaluations on the EmoRec II corpus are given, examining
to what extent additional data can improve classification and
which parameters of the density estimation are optimal.
Keywords Partially supervised learning · Clustering ·
Affective computing
1 Introduction and related work
1.1 Partially supervised learning
There is a variety of recent research on techniques of partially
supervised learning that make use of labeled data together
with unlabeled ones to improve automatic classification. One
of the early approaches is to use the well established expec-
tation maximization algorithm to better estimate a generative
model and use it for classification [10]. Most commonly it is
done by conducting the following steps [1]:
M. Schels (B ) · M. Kächele · M. Glodek · F. Schwenker
Institute of Neural Information Processing,
Ulm University, Ulm, Germany
e-mail: martin.schels@uni-ulm.de
D. Hrabal · S. Walter
Medical Psychology, Ulm University, Ulm, Germany
1. Estimate an initial model using the labeled data, solely.
2. Label the additional unlabeled samples accordingly.
3. Re-estimate the parameters of the model using the all the
data.
4. Check for stopping criteria and proceed to step 2.
A further technique to incorporate unlabeled data in clas-
sification approaches is transductive learning [52], where
the given test cases are employed as additionally available
unlabeled samples. In inductive learning, a separating deci-
sion function is explicitly defined on the whole space. In
contrast to that, a transductive learner attempts to find an
optimal assignment of categories only on the given testing
data. Hereby the cluster assumption is exploited: It is more
likely for a separating decision border to be in low density
regions of the space. This means that the classes are clustered
together. The most common transductive learning approach
is the transductive support vector machine [18], that finds
a low density region by maximization of the margin on all
available data, labeled and unlabeled.
Other popular techniques are active learning where the
most informative sample from the unlabeled data, i.e. the
one closest to a precomputed decision boundary, is selected
by the algorithm and passed to an expert [9] and the deci-
sion boundary is adapted accordingly. In order to con-
duct a fully automatic process, semi supervised learning
can be applied: Classifiers are directly used to annotate
the unlabeled data. A classifier can label data for itself by
selection the most confident data samples and add them
to the training set (self training) [38]. Another option is
to use several classifiers in order to mutually select confi-
dent samples for the respective training data (co-training)
[3]. This expansion of the training set is severely under-
mined, when noise (i.e. erroneous classified samples) is
added to the training set. This implies that the initial clas-
123