JAR-Aibo: A Multi-view Dataset for Evaluation of Model-Free Action Recognition Systems Marco K¨ orner and Joachim Denzler Friedrich Schiller University of Jena Computer Vision Group Ernst-Abbe-Platz 3, 07743 Jena, Germany {marco.koerner,joachim.denzler}@uni-jena.de http://www.inf-cv.uni-jena.de Abstract. We present a novel multi-view dataset for evaluating model- free action recognition systems. Superior to existing datasets, it covers 56 distinct action classes. Each of them was performed ten times by remotely controlled Sony ERS-7 AIBO robot dogs observed by six distributed and synchronized cameras at 17 fps and VGA resolution. In total, our dataset contains 576 sequences. Baseline results show its applicability for benchmarking model-free action recognition methods. Keywords: action recognition, behaviour understanding, dataset. 1 Introduction and Recent Work The automatic recognition of action and behaviour from video streams gained more and more scientiﬁc interest during the last decades, as pointed out by recent reviews[14,1,3]. In order to evaluate and compare algorithms for action recognition or behavior understanding, open-access datasets of high complexity are evidently needed. During the recent years of research on this topic, numerous of those datasets were published and used by the community. The vast majority is designed for single-view approaches, while datasets for multi-view scenarios are rare and only cover a small number of distinct action classes. We present a multi-view dataset for evaluating model-free action recognition systems. To especially assess the performance of model-free approaches, 56 re- motely triggered actions performed by Sony ERS-7 AIBO robot dogs were captured by six synchronized cameras resulting in 576 multi-view sequences. 1.1 Single-View Datasets As the scientiﬁc eﬀorts started to concentrate on recognition of actions and activities captured by single cameras, most of the early datasets show single persons performing basic actions captured from only one view in front of simple and static backgrounds. The most prominent are the Weizmann[7] and the KTH[16] dataset, where the latter shows varying clothing of the actors. A. Petrosino, L. Maddalena, P. Pala (Eds.): ICIAP 2013 Workshops, LNCS 8158, pp. 527–535, 2013. c  Springer-Verlag Berlin Heidelberg 2013