Automatic Face Tracking System using Quadrotors: Control by Goal Position Thresholding Veerachart Srisamosorn, Noriaki Kuwahara, Atsushi Yamashita, Taiki Ogata, Jun Ota Abstract— This paper proposes a human face tracking system for obtaining elderly people’s facial images, which can be used to estimate their individual emotion. The system consists of Xbox Kinect sensors for human detection and robot navigation, and Bitcraze’s Crazyflie quadrotors to overcome occlusion by moving towards people to obtain closer facial images. Using the person’s head position to set up the goal position for the quadrotor, noise from the measured head position can result in vibration of the goal position, and subsequently the quadrotor, which can have effects on facial image acquisition and safety problem. In order to improve the stability of the quadrotor, we propose an algorithm using threshold to fix the quadrotor’s goal position. Performance of the algorithm is evaluated by using the detected positions of the quadrotor and is compared with tracking without threshold algorithm, as well as with different threshold values. Based on these positions, face tracking results are also calculated by simulating projection of the face in real world onto the image plane and evaluate the quality of the obtained face. I. I NTRODUCTION In indoor environment, human tracking is beneficial for many uses, for example in surveillance system inside a household or industrial factory. With an additional function of face tracking, applications can be extended further to per- sonal identification and several analyses using facial images. A specific application considered in this paper is a robot system which tracks and follows elderly people living in an elderly nursing home, obtains their facial images, and estimates their individual emotion during their daily activities as part of the mental care for the residents. There are a number of researches being conducted to track moving objects and people. Multiple stereo cameras are used to track motions of multiple people over a wide area in [1] and can perform well in crowded condition. A stereo camera and laser rangefinder are attached to a mobile robot in [2] to track human by using features of human upper body and face detection from the camera and legs data from the laser rangefinder. However, it requires initialization by the user to select who is the person to be tracked. In [3], a human is tracked by a mobile robot which detects his/her face from V. Srisamosorn and A. Yamashita is with Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 Japan. veera.sr@race.u-tokyo.ac.jp, yamashita@robot.u-tokyo.ac.jp N. Kuwahara is with the Department of Advanced Fibro- Science, Kyoto Institute of Technology, Kyoto-shi, 6068585 Japan. kuwahara@atr.jp T. Ogata and J. Ota are with Research into Artifacts, Center for Engi- neering (RACE), The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa- shi, Chiba, 277-8568 Japan. ogata@race.u-tokyo.ac.jp, ota@race.u-tokyo.ac.jp images obtained by the on-board web camera, takes the picture of him/her and prints it out. Face detection is done by skin-color detection and eye detection, which have possibility of false detection and other objects with similar color and pattern, such as table or wall, can be falsely recognized as a face. Tracking also starts after a face is found by the robot, so initial search is also required. The shape of the mobile robot being used is also quite large and tall due to the attached camera on top, which increases the risk of toppling and injuring tracked people. Unmanned aerial vehicles (UAVs) are also becoming a popular platform for mobile robots, with many applications in consideration, including surveillance and object tracking. To chase a moving object on the floor, [4] uses a camera attached to the bottom of the quadrotor, facing towards the ground, to obtain images and implements color-based tracking method with particle filter to deal with occlusions, noise, turns and scale changes. However, the object being considered in the experiments only moves on a 2D plane, and the quadrotor also need to be above the object at the beginning of tracking. One of the challenging points of using UAVs is how to control them in indoor environments, as position data from global positioning system are not available. Many of the researches adopt the practice of attaching various kinds of cameras to the quadrotors to obtain their position for autonomous flights. In [5], a Kinect sensor is attached below the quadrotor, pointing towards the ground, to obtain depth maps which are used for altitude control of the quadrotor. 3D model of the edge of the indoor environment is used for position control of the quadrotor in structured indoor environment in [6]. In [7], a medium-sized hexacopter with three industrial high-speed cameras to generate 3D map using high-end CPU and middle-end GPU is proposed and tested with autonomous take off and landing with position hold based on computer vision data. There are also some test beds available for experiment- ing with control algorithms, for example Real-time indoor Autonomous Vehicle test ENvironment (RAVEN) [8] and Flying Machine Arena [9], which provide fast and accurate position. However, the systems utilize a number of high- quality sensing devices, for example motion capture system, and therefore are expensive. There are also researches which implement vision system for quadrotor’s position tracking, for example as in [10], where multiple cameras are used to track colored markers placed on the quadrotor and extended Kalman filter is implemented to estimate its states. However, the experiments did not include feedback control in the