Multiple Classifier Combination Using Reject Options and Markov Fusion Networks Michael Glodek * Institute of Neural Information Processing University of Ulm, Germany michael.glodek@uni-ulm.de Martin Schels * Institute of Neural Information Processing University of Ulm, Germany martin.schels@uni-ulm.de Günther Palm Institute of Neural Information Processing University of Ulm, Germany guenther.palm@uni-ulm.de Friedhelm Schwenker Institute of Neural Information Processing University of Ulm, Germany friedhelm.schwenker@uni- ulm.de ABSTRACT The audio/visual emotion challenge (AVEC) resembles a benchmarking data collection in order to evaluate and de- velop techniques for the recognition of affective states. In our work, we present a Markov fusion network (MFN) for the combination of different individual classifiers, that is de- rived from the well-known Markov random fields (MRF). It is capable to restore missing values from a sequence of decisions and can integrate multiple channels and weights them dynamically using confidences. The approach shows promising challenge results compared to the baseline. Categories and Subject Descriptors G.3 [Mathematics of Computing]: PROBABILITY AND STATISTICS; H.5.2 [Information Systems]: INFORMA- TION INTERFACES AND PRESENTATION—User Inter- faces ; I.4.8 [Computing Methodologies]: IMAGE PRO- CESSING AND COMPUTER VISION—Scene Analysis Keywords Markov model, information fusion, pattern recognition 1. INTRODUCTION The emotions expressed by humans differ significantly in non-acted data sets compared to (over-)acted data sets [15, 18, 12]. The expressions are often weak, occur in many vari- ations or appear as mixtures of emotional states and others. * These authors contributed equally to this work. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICMI’12, October 22–26, 2012, Santa Monica, California, USA. Copyright 2012 ACM 978-1-4503-1467-1/12/10 ...$15.00. Especially emotional expressions in human computer inter- action (HCI) situations bear many challenges, since the state of the human interlocutor is in general not accustomed to express emotions in front of a computer. In general, the acquisition of non-acted emotional data does not guarantee the presence of observable emotions, such that a subsequent annotation of the data by experts is required. Since these experts estimate the most probable emotional state based on the observable data, the resulting ground truth for clas- sification will rely on their biased opinions. In summary, the automated recognition of emotional states has to deal with a manifold of challenges, which should be taken into account in the model of the classifier architecture. Estimating an emotional state of a human being from sen- sory data is a challenging task for a machine classifier. Un- constrained recordings will lead to weak features and hence to weak classifiers, that will most likely fail in distinct cases. One approach to handle such settings is to assess the qual- ity of a classification, e.g. using uncertainty measures. With this measure for the classification it is now possible to re- ject a given sample when the risk of misclassification is too high. Furthermore, the combination of different sources can help to improve the classification [14]. Hereby, it is also of- ten useful to have uncertainty values [8]. Additionally one can make smoothness assumptions on the data: It is not likely for an emotional annotation to change very quickly over time. Such assumptions can be influenced by external constraints, e.g. by the turn taking mechanisms in verbal communication. In order to implement such a strategy, we propose the Markov fusion network (MFN). It is designed to incorpo- rate weak classifiers, weight results from different individual classifiers and to reconstruct missing decisions. The remainder of this work is organized as follows: Sec- tion 2 describes the concept of classifiers making use of an re- jection option and the Markov fusion network (MFN) which combines the remaining decisions to obtain a final estimate for every time step. In Section 3 we present the features and the base classifiers utilized. The results are summarized in the Section 4 and the conclusion is drawn in Section 5.