Socialising through Orchestrated Video Communication * Marian Ursu Pedro Torres Vilmos Zsombori Michael Frantzis Department of Computing Goldsmiths, University of London {m.ursu,p.torres,v.zsombori, m.frantzis}@gold.ac.uk Rene Kaiser Institute of Information and Communication Technologies JOANNEUM RESEARCH rene.kaiser@joanneum.at ABSTRACT We report on the development of a video communication medium through which groups of people situated in diﬀer- ent physical locations can naturally talk to each other, see and hear each other, and engage in social entertaining activ- ities. Participants are free to move within their space and behave in a manner closer to collocated experiences. Essen- tially, this is implemented as a multi-location, multi-camera, hands-free video conferencing system between groups, with integrated support for entertaining activities. In this paper we focus on automatic orchestration, the reasoning process that applies screen grammar to best support the communi- cation. We present a formal model for representing orches- tration rules and discuss initial evaluation results. Categories and Subject Descriptors I.2.4 [Artiﬁcial Intelligence]: Knowledge Representation Formalisms and Methods—representations (procedural and rule-based), representation languages ; H.4.3 [Information Systems Applications]: Communications Applications— video conferencing General Terms Design, Experimentation, Human Factors, Languages Keywords Multi-location Videoconferencing, Orchestration, Social En- tertainment Acknowledgements The research leading to these results has received funding from the European Community’s Seventh Framework Pro- gramme (FP7/2007-2013) under grant agreement no. ICT- 2007-214793. * Area Chair: Mor Naaman Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. MM’11, November 28–December 1, 2011, Scottsdale, Arizona, USA. Copyright 2011 ACM 978-1-4503-0616-4/11/11 ...$10.00. 1. INTRODUCTION We are working towards the development of a video-based virtual communication and interaction space in which dis- persed groups of people are able to naturally talk to each other, and to create and share moments of fun, by engaging in entertaining social activities. In simple terms, this in- teraction mode can be described as a multi-location, multi- camera, hands-free video conference, with integrated sup- port for engagement in social entertaining activities between groups of people who already know each other, such as fam- ily and friends. This work has been motivated in [8]. The aim is for the communication technology to create as natural and immersive a communication medium as possi- ble between groups. Participants are free to move within their spaces and behave in ways close to a collocated un- mediated interaction. Multiple cameras oﬀer the beneﬁt of diﬀerent perspectives on each location and the system is responsible for providing the best support for the communi- cation channels at each time during the interaction, seam- lessly integrating the entertaining activities that are taking place. This process is called automatic orchestration. Or- chestration is about developing new screen grammars for video-mediated communication and play and expressing it in a computational form amenable to automatic application during the interaction. The main requirements we chose in- clude: (i) pragmatics — all that is important to be seen in each location is indeed shown; (ii) autonomy — absence of direct instructions to orchestration; (iii) transparency — end users’ perception of the communication medium should be minimal; (iv) aesthetics — screen storytelling grammar employed to better convey what is pragmatically required. Work related to the overall paradigm has been the main topic of a EuroITV workshop 1 . Related research in inter- active TV narratives investigates automatic creation of TV productions which adapt to viewers requirements at the time of delivery both with pre-recorded content [7] and live con- tent [4]. A major distinction to our approach regards the roles played by the end users — actors vs spectators. Intelligent virtual camera planning [2] deals with the auto- matic planning of camera placement and movement, some- times including also shot composition, in order to achieve the best narrative eﬀect. Most of this work, although closely re- lated, is carried out in virtual worlds, where there are fewer constraints regarding the cameras and far more information regarding the world itself. Extensions to the real physical 1 Enhancing Social Communication and Belonging by Inte- grating TV Narrativity and Game-Play, EuroITV 2009. 981