Multi-Viewer Gesture-Based Interaction for Omni-Directional Video Gustavo Rovelo 1,2 , Davy Vanacken 1 , Kris Luyten 1 , Francisco Abad 3 , Emilio Camahort 3 1 Hasselt University - tUL - iMinds, Expertise Centre for Digital Media Wetenschapspark 2, 3590 Diepenbeek, Belgium 2 Dpto. de Sistemas Inform´ aticos y Computaci ´ on. 3 Inst. Universitario de Autom´ atica e Inform´ atica Industrial Universitat Polit` ecnica de Val` encia - Camino de Vera S/N, Valencia, Spain {gustavo.roveloruiz, davy.vanacken, kris.luyten}@uhasselt.be, {fjabad, camahort}@dsic.upv.es ABSTRACT Omni-directional video (ODV) is a novel medium that offers viewers a 360° panoramic recording. This type of content will become more common within our living rooms in the near future, seeing that immersive displaying technologies such as 3D television are on the rise. However, little atten- tion has been given to how to interact with ODV content. We present a gesture elicitation study in which we asked users to perform mid-air gestures that they consider to be appropriate for ODV interaction, both for individual as well as collocated settings. We are interested in the gesture variations and adap- tations that come forth from individual and collocated usage. To this end, we gathered quantitative and qualitative data by means of observations, motion capture, questionnaires and in- terviews. This data resulted in a user-deﬁned gesture set for ODV, alongside an in-depth analysis of the variation in ges- tures we observed during the study. Author Keywords Gesture Elicitation; User-Deﬁned Gestures; Omni-Directional Video; Multi-User Interaction ACM Classiﬁcation Keywords H.5.m. Information Interfaces and Presentation (e.g. HCI): Miscellaneous General Terms Human Factors; Design; Measurement. INTRODUCTION ODV is an emerging media format that offers viewers a 360° panoramic video (Figure 1). To create an immersive experience, ODV is typically shown in a CAVE-like setup, or Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. Request permissions from permissions@acm.org. CHI 2014, April 26 - May 01 2014, Toronto, ON, Canada Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2473-1/14/04...$15.00. http://dx.doi.org/10.1145/2556288.2557113 a personal display (e.g. a head-mounted display) in combi- nation with a tracking system to calculate the viewer’s cor- rect viewpoint. Recent efforts such as Microsoft’s Illumi- room [15] provide interesting possibilities for ODV, as they show how a living room environment could be turned into a small CAVE-like theatre. Benko and Wilson [4] show differ- ent scenarios in which ODV can be used, as they describe a portable dome setup in which users can interact with applica- tions such as a 360° video conferencing system, a multi-user game or an astronomical data visualization system. Although capturing and rendering ODV have been widely in- vestigated and optimized over time, little attention is given to interaction with ODV content. Interaction with ODV includes triggering typical control operations we know from regular video (e.g. play, pause, fast forward and go backward), but also includes changing viewpoint by means of typical spatial interactions such as zooming and panning. These spatial in- teractions are, however, somewhat constrained, since spatial manipulations are always relative to the original camera po- sition that was used while recording the ODV. Bleumers et al. [6] recently presented a number of interesting ﬁndings regarding users’ expectations of ODV. Their research highlights the uncertainty among users about how to interact with ODV and puts forward mid-air gestural interfaces as a possible solution, although they did not explore such inter- faces in their work. Mid-air gesturing has been used since the early nineties for controlling television sets [2, 8], and nowa- days television sets with a built-in camera and simple gestural interface are commercially available. We envision ODV content becoming more and more com- mon in the future and accessible within the context of our living rooms. As a result, traditional television watching ex- periences will change, since multiple viewers no longer have the same region of focus (i.e. the television screen in front of them), but are able to watch video content in any direction. This change also implies that traditional interaction methods, such as a remote control or the current gesture-based TV in- terfaces, need to be re-evaluated. Our aim is to understand which mid-air gestures are the most appropriate for interacting with ODV, not only when users are