Proceeding of the 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009. A DISTRIBUTED AND SCALABLE PERSON TRACKING SYSTEM FOR ROBOTIC VISUAL SERVOING WITH 8 DOF IN VIRTUAL REALITY TV STUDIO AUTOMATION Suraj Nair, Giorgio Panin, Thorsten R¨ oder, Thomas Friedlhuber, Alois Knoll, Member, IEEE Technische Universit¨ at M ¨ unchen, Fakult¨ at f ¨ ur Informatik Boltzmannstrasse 3, 87452 Garching bei M ¨ unchen (Germany) {nair,panin,roeder,friedelhu,knoll}@in.tum.de ABSTRACT In this paper, a distributed and scalable person tracking sys- tem for visual servoing using Industrial Robot Arms for vir- tual reality television studios (VR-TV ) is presented. The sys- tem robustly tracks the moderator while freely moving, sit- ting or walking around the studio, and the estimation result can be used to drive the main broadcasting camera mounted on a large robotic arm connected with a pan tilt unit. The system consists of a person tracking system operating on the TV camera, which localizes the moderator in its ﬁeld of view, and a overhead tracking system which localizes the moderator over the complete studio environment. The sys- tem is completely scalable in scenarios where a single scene is to be shot from multiple angles using multiple TV cam- eras. A common overhead tracking system monitors the moderator over the complete studio environment allowing the individual camera systems to initialize themselves and re-initialize in case of target loss. Application of the pro- posed tracking system to real-time VR-TV results in a robot cameraman, able to keep the moderator inside the screen with jitter-free viewpoint adjustments, as required by the VR scene rendering engine. 1. INTRODUCTION Virtual TV studios (Fig. 1) have gained immense impor- tance in the broadcasting area, due to the developments in computer graphics hardware and software and their capa- bility to provide a very impressive virtual reality experience for educational, documentary movies, as well as weather or ﬁnancial forecast transmissions, only to name a few. How- ever, the quality of the result depends on the real-time ro- bustness and accuracy of three major components of the system, namely: 1. The camera tracker, that recovers the absolute 3D pose of the camera (usually from external in- frared sensors or odometry), 2. The rendering software, that uses the estimated camera pose in order to generate a syn- thetic background or additional scene items, 3. The video mixer, which combines the synthetic image with the real camera input, in order to produce the ﬁnal VR scene. Therefore, the robot and the VR system requires a smooth and precise motion input, in order to produce synthetic im- ages with the correct overlap, and without undesired jit- tering effects. However, currently in most situations the camera is still manually controlled by a cameraman, that may not achieve the smoothness and precision of motion re- quired; therefore, in such cases the camera has been mounted on a robot arm, with a few pre-planned movements avail- able (zoom, ﬂy-by, etc.), which on one hand increases the workspace of camera operation and also provide the 3D pose of the camera directly through the robot kinematics, but on the other hand also limits the moderator freedom of motion. By using auxiliary video inputs looking at the scene, together with real-time computer vision tools, the system would instead be able to localize the moderator and keep her/him within the screen while sitting or freely walk- ing inside the studio, with almost no need for human inter- vention. For this purpose, in this paper we present a distributed and scalable tracking system, based on a previous work [1] with several improvements as described in the following. The new system directly uses video input from the robot mounted TV camera, without the need for an additional Firewire device on top of it. Robustness of tracking has been improved by a novel integration of visual modalities. Ad- ditional control strategies (Normal Mode and Hold Angle Mode) have been developed, in order to control the robot for better visual effects. The 6dof Sta¨ ubli Industrial Robot Arm to which a pan-tilt unit (2dof) is connected, provides overall 8dof to control the TV camera. A centralized communica- tion engine using TCP/IP sockets has been developed, in or- der to provide efﬁcient and reliable communication between the tracking system and the robot controller, also allowing the system to be scaled to multiple robots when a scene has to be shot from multiple views. Finally, in order to obtain a reliable localization over the whole area, we employ a distributed system consisting of a person tracker using the robot-mounted TV camera, and a overhead camera tracker monitor the target position over the complete studio environment. The overhead system al- ISMA09-1